Matrix analysis 


ROGER A. HORN 
The Johns Hopkins University 


CHARLES R. JOHNSON 
College of William and Mary 


The right of the 
University of Cambridge 
to print and seli 
all manner of books 
was granted by 
Henry VHI in 1534. 


The University has printed 
and published continuously 
since 1584. 


CAMBRIDGE UNIVERSITY PRESS 


Cambridge 
London New York New Rochelle 
Melbourne Sydney 


To the matrix theory community 
and 
to our families 


Dana, Jennifer, and Emily 
Susan, Ceres, Corinne, and Howard 


for their understanding support 


Published by the Press Syndicate of the University of Cambridge 
The Pitt Building, Trumpington Street, Cambridge CB2 IRP 


40 West 20th Street, New York, NY 10011, USA 


10 Stamford Road, Oakleigh, Melbourne 3166, Australia 


© Cambridge University Press 1985 


First published 1985 

Reprinted with corrections 1987 
Reprinted 1988 

Reprinted with corrections 1990 


Printed in the United States of America 


Library of Congress Cataloging in Publication Data 
Horn, Roger A. 

Matrix analysis. 

Bibliography: p. 

Includes index. 

1. Matrices. I. Johnson, Charles R. I. Title 
QA188.H66 1985 512,9'434 85-7736 


ISBN 0-521-30586-1 hard covers 
ISBN 0-521-38632-2 paperback 


Preface 


Contents 


Chapter 0 Review and miscellanea 


Chapter 


Chapter 


0.0 Introduction 

0.1 Vector spaces 

0.2 Matrices 

0.3. Determinants 

0.4 Rank 

0.5  Nonsingularity 

0.6 The usual inner product 
0.7 Partitioned matrices 

0.8 Determinants again 

0.9 Special types of matrices 
0.10 Change of basis 


1 Eigenvalues, eigenvectors, and similarity 
1.0 Introduction 

1.1 The eigenvalue-eigenvector equation 
1.2 The characteristic polynomial 

1.3 Similarity 

1.4 Eigenvectors 


2 Unitary equivalence and normal matrices 
2.0 Introduction 
2.1 Unitary matrices 


vi Contents 


Unitary equivalence 

Schur’s unitary triangularization theorem 
Some implications of Schur’s theorem 
Normal matrices 

OR factorization and algorithm 


Chapter 3 Canonical forms 


3.0 
3.1 
3.2 


3.3 


3.4 
3.5 


Introduction 

The Jordan canonical form: a proof 

The Jordan canonical form: some observations 
and applications 

Polynomials and matrices: the minimal 
polynomial 

Other canonical forms and factorizations 
Triangular factorizations 


Chapter 4 Hermitian and symmetric matrices 


4.0 
4.1 


4.2 


4.3 


4.4 
4.5 


4.6 


Introduction 

Definitions, properties, and characterizations of 
Hermitian matrices 

Variational characterizations of eigenvalues of 
Hermitian matrices 

Some applications of the variational 
characterizations 

Complex symmetric matrices 

Congruence and simultaneous diagonalization 
of Hermitian and symmetric matrices 
Consimilarity and condiagonalization 


Chapter 5 Norms for vectors and matrices 


5.0 
5.1 


5.2 
5.3 
5.4 
5.5 
5.6 
5.7 
5.8 


Introduction 

Defining properties of vector norms and inner 
products 

Examples of vector norms 

Algebraic properties of vector norms 
Analytic properties of vector norms 
Geometric properties of vector norms 
Matrix norms 

Vector norms on matrices 

Errors in inverses and solutions of linear 
systems 


72 
79 
85 
100 
112 


119 
119 
121 


129 


142 
150 
158 


167 
167 


169 


176 


181 
201 


218 
244 


257 
257 


259 
264 
268 
269 
281 
290 
320 


335 


Contents 


Chapter 6 Location and perturbation of eigenvalues 


6.0 
6.1 
6.2 
6.3 
6.4 


Introduction 

Geršgorin discs 

Geršgorin discs - a closer look 
Perturbation theorems 

Other inclusion regions 


Chapter 7 Positive definite matrices 


7.0 
7.1 
7.2 
7.3 


7.4 


7.5 
7.6 


7.7 
7.8 


Introduction 

Definitions and properties 
Characterizations 

The polar form and the singular value 
decomposition 

Examples and applications of the singular value 
decomposition 

The Schur product theorem 
Congruence: products and simultaneous 
diagonalization 

The positive semidefinite ordering 
Inequalities for positive definite matrices 


Chapter 8 Nonnegative matrices 


8.0 
8.1 


8.2 
8.3 
8.4 
8.5 
8.6 
8.7 


Appendices 
A 
B 
C 
D 


E 


References 
Notation 
Index 


Introduction 

Nonnegative matrices - inequalities and 
generalities 

Positive matrices 

Nonnegative matrices 

Irreducible nonnegative matrices 
Primitive matrices 

A general limit theorem 

Stochastic and doubly stochastic matrices 


Complex numbers 

Convex sets and functions 

The fundamental theorem of algebra 
Continuous dependence of the zeroes of a 
polynomial on its coefficients 
Weierstrass’s theorem 


vii 
343 
343 
344 
353 
364 
378 


391 
391 
396 
402 


411 


427 
455 


464 
469 
476 


487 
487 


490 
495 
503 
507 
515 
524 
526 


531 
533 
537 


539 
541 
543 


547 
549 


Preface 


Linear algebra and matrix theory have long been fundamental tools in 
mathematical disciplines as well as fertile fields for research in their own 
right. In this book, and in the companion volume, Topics in Matrix Ana- 
lysis, we present classical and recent results of matrix analysis that have 
proved to be important to applied mathematics. The book may be used 
as an undergraduate or graduate text and as a self-contained reference 
for a variety of audiences. We assume background equivalent to a one- 
semester elementary linear algebra course and knowledge of rudimentary 
analytical concepts. We begin with the notions of eigenvalues and eigen- 
vectors; no prior knowledge of these concepts is assumed. 

Facts about matrices, beyond those found in an elementary linear alge- 
bra course, are necessary to understand virtually any area of mathemati- 
cal science, whether it be differential equations; probability and statistics; 
optimization; or applications in theoretical and applied economics, the 
engineering disciplines, or operations research, to name only a few. But 
until recently, much of the necessary material has occurred sporadically 
(or not at all) in the undergraduate and graduate curricula. As interest in 
applied mathematics has grown and more courses have been devoted to 
advanced matrix theory, the need for a text offering a broad selection of 
topics has become more apparent, as has the need for a modern reference 
on the subject. 

There are a number of well-loved classics in matrix theory, but they 
are not well suited for general classroom use, nor for systematic individ- 
ual study. A lack of problems, applications, and motivation; an inade- 
quate index; and a dated approach are among the difficulties confronting 
readers of some traditional references. More recent books tend to be either 


ix 


x Preface 


elementary texts or treatises devoted to special topics. Our goal was to 
write a book that would be a useful modern treatment of a broad range 
of topics. 

One view of “matrix analysis” is that it consists of those fopics in 
linear algebra that have arisen out of the needs of mathematical analysis, 
such as multivariable calculus, complex variables, differential equations, 
optimization, and approximation theory. Another view is that matrix 
analysis is an approach to real and complex linear algebraic problems 
that does not hesitate to use notions from analysis - such as limits, con- 
tinuity, and power series - when these seem more efficient or natural than 
a purely algebraic approach. Both views of matrix analysis are reflected 
in the choice and treatment of topics in this book. We prefer the term 
matrix analysis to linear algebra as an accurate reflection of the broad 
scope and methodology of the field. 

For review and convenience in reference, Chapter 0 contains a sum- 
mary of necessary facts from elementary linear algebra, as well as other 
useful, though not necessarily elementary, facts. Chapters 1, 2, and 3 
contain mainly core material likely to be included in any second course in 
linear algebra or matrix theory: a basic treatment of eigenvalues, eigen- 
vectors, and similarity; unitary similarity, Schur triangularization and its 
implications, and normal matrices; and canonical forms and factoriza- 
tions including the Jordan form, LU factorization, QR factorization, 
and companion matrices. Beyond this, each chapter is developed sub- 
stantially independently and treats in some depth a major topic: 


Hermitian and complex symmetric matrices (Chapter 4). We give 
special emphasis to variational methods for studying eigenvalues 
of Hermitian matrices and include an introduction to the notion 
of majorization. 


Norms on vectors and matrices (Chapter 5) are essential for er- 
ror analyses of numerical linear algebraic algorithms and for the 
study of matrix power series and iterative processes. We discuss 
the algebraic, geometric, and analytic properties of norms in some 
detail, and make a careful distinction between those norm results 
for matrices that depend on the submultiplicativity axiom for 
matrix norms and those that do not. 


Eigenvalue location and perturbation results (Chapter 6) for gen- 
eral (not necessarily Hermitian) matrices are important for many 
applications. We give a detailed treatment of the theory of Gerš- 
gorin regions, and some of its modern refinements, and of rele- 
vant graph theoretic concepts. 


Preface xi 


Positive definite matrices (Chapter 7) and their applications, in- 
cluding inequalities, are considered at some length. A discussion 
of the polar and singular value decompositions is included, along 
with applications to matrix approximation problems. 


Component-wise nonnegative and positive matrices (Chapter 8) 
arise in many applications in which nonnegative quantities nec- 
essarily occur (probability, economics, engineering, etc.), and 
their remarkable theory reflects the applications. Our develop- 
ment of the theory of nonnegative, positive, primitive, and irre- 
ducible matrices proceeds in elementary steps based upon the 
use of norms. 


In the companion volume, further topics of similar interest are treated: 
the field of values and generalizations; inertia, stable matrices, M-matrices 
and related special classes; matrix equations, Kronecker and Hadamard 
products; and various ways in which functions and matrices may be 
linked. 

This book provides the basis for a variety of one- or two-semester 
courses through selection of chapters and sections appropriate to a par- 
ticular audience. We recommend that an instructor make a careful pre- 
selection of sections and portions of sections of the book for the needs of 
a particular course. This would probably include Chapter 1, much of 
Chapters 2 and 3, and facts about Hermitian matrices and norms from 
Chapters 4 and 5. 

Most chapters contain some relatively specialized or nontraditional 
material. For example, Chapter 2 includes not only Schur’s basic theorem 
on unitary triangularization of a single matrix, but also a discussion of 
simultaneous triangularization of families of matrices. In the section on 
unitary equivalence, our presentation of the usual facts is followed by a 
discussion of trace conditions for two matrices to be unitarily equivalent. 
A discussion of complex symmetric matrices in Chapter 4 provides a 
counterpoint to the development of the classical theory of Hermitian 
matrices. Basic aspects of a topic appear in the initial sections of each 
chapter, while more elaborate discussions occur at the ends of sections or 
in later sections. This strategy has the advantage of presenting topics ina 
sequence that enhances the book’s utility as a reference. It also provides 
a rich variety of options to the instructor. 

Many of the results discussed hold or can be generalized to hold for 
matrices over other fields or in some broader algebraic setting. However, 
we deliberately confine our domain to the real and complex fields where 
familiar methods of classical analysis as well as formal algebraic tech- 
niques may be employed. 


xii Preface 


Though we generally consider matrices to have complex entries, most 
examples are confined to real matrices, and no deep knowledge of com- 
plex analysis is required. Acquaintance with the arithmetic of complex 
numbers is necessary for an understanding of matrix analysis and is 
covered to the extent necessary in an appendix. Other brief appendices 
cover several peripheral, but essential, topics such as Weierstrass’s theo- 
rem and convexity. 

We have included many exercises and problems because we feel these 
are essential to the development of an understanding of the subject and 
its implications. The exercises occur throughout as part of the develop- 
ment of each section; they are generally elementary and of immediate use 
in understanding the concepts. We recommend that the reader work at 
least a broad selection of these. Problems are listed (in no particular 
order) at the end of each section; they cover a range of difficulties and 
types (from theoretical to computational) and they may extend the topic, 
develop special aspects, or suggest alternate proofs of major ideas. Sig- 
nificant hints are given for the more difficult problems. The results of 
some problems are referred to in other problems or in the text itself. We 
cannot overemphasize the importance of the reader’s active involvement 
in carrying out the exercises and solving problems. 

While the book itself is not about applications, we have, for motiva- 
tional purposes, begun each chapter with a section outlining a few appli- 
cations to introduce the topic of the chapter. 

Readers who wish to consult alternate treatments of a topic for ad- 
ditional information are referred to the books listed in the References 
section following the appendices. These books are cited in the text using 
a brief mnemonic code; for example, a book by Jones and Smith might 
be referred to as [JSm]. The codes and complete citations appear alpha- 
betically by author in the References section. 

The list of book references is not exhaustive. As a practical concession 
to the limits of space in a general multitopic book, we have minimized 
the number of citations in the text. A small selection of references to 
papers - such as those we have explicitly used - does occur at the end of 
most sections accompanied by a brief discussion, but we have made no 
attempt to collect historical references to classical results. Extensive bib- 
liographies are provided in the more specialized books we have refer- 
enced. The reader should also be aware of broad and current biblio- 
graphical resources covering portions of matrix analysis such as the KWIC 
Index for Numerical Linear Algebra [CaLe] and sections 15 and 65 of the 
Mathematical Reviews. 

We appreciate the helpful suggestions of our colleagues and students 
who have taken the time to convey their reactions to the class notes and 


Preface xii 
preliminary manuscripts that were the precursors of the book. They in- 
clude Wayne Barrett, Leroy Beasley, Bryan Cain, David Carlson, Dipa 
Choudhury, Risana Chowdhury, Yoo Pyo Hong, Dmitry Krass, Dale 
Olesky, Stephen Pierce, Leiba Rodman, and Pauline van den Driessche. 

R.A.H. 
C.R.J. 


CHAPTER 0 


Review and miscellanea 


0.0 Introduction 


The purpose of this chapter is to catalog briefly, without proof, a number 
of useful concepts and facts, many of which implicitly or explicitly under- 
lie the material covered in the main portion of the book. Much of this 
material would be included, in some form, in an elementary course in 
linear algebra, but we also include a number of useful items that are not 
commonly found elsewhere or that do not easily fit into the subsequent 
structure. Thus, this section may serve the reader as a short review prior 
to beginning the book or as a convenient reference when necessary. We 
also use this chapter to set basic notation and give some definitions; thus, 
reference to it will also be useful for these purposes. We do assume that 
the reader is already familiar with the elementary concepts of linear alge- 
bra and with mechanical aspects of matrix manipulations, such as matrix 
multiplication and addition. 


0.1 Vector spaces 


Though generally implicitly, and not usually explicitly, involved in the 
treatment in this book, a vector space is the fundamental setting for 
matrix theory. 


0.1.1 Scalar field. Underlying a vector space is the field, or set of 
scalars, from which multiplication occurs. For our purposes, that under- 
lying field will almost always be the real numbers R or the complex num- 
bers C (see Appendix A) under the usual addition and multiplication, but 


l 


2 Review and miscellanea 


it could be the rational numbers, the integers modulo a specified prime 
number, or some other field. When the field is unspecified, we use the 
symbol F. To qualify as a field, a set of scalars must be closed under two 
specified binary operations (“addition” and “multiplication”); both oper- 
ations must be associative and commutative and have an identity element 
in the set; inverses must exist in the set for all elements under the addition 
operation and for all elements except the additive identity (0) under the 
multiplication operation; the multiplication operation must also be dis- 
tributive over the addition operation. 


0.1.2 Vector spaces. A vector space V over a field F is a set V of objects 
(called vectors) which is closed under a binary operation (“addition”) 
which is associative and commutative and has an identity (“0”) and 
additive inverses in the set. The set is also closed under an operation of 
left multiplication of the vectors by elements of the scalar field F, with 
the following properties for all a, b e F and all x, ye V: a(x+y)=ax+ay, 
(a+ b)x =ax+ bx, a(bx) =(ab)x, and ex =x for the multiplicative iden- 
tity eeF. 

For a given field F, the set F” of n-tuples (n a positive integer) with 
components from F forms a vector space over F under the obvious oper- 
ations (component-wise addition in F”). The special cases R” and C” are 
the basic vector spaces of this book. The set of polynomials with real or 
with complex coefficients (of no more than a specified degree or of arbi- 
trary degree) and the set of real or complex valued continuous functions 
or arbitrary functions on an interval [a,b] CR are also examples of 
vector spaces (over R or C). There is, of course, a fundamental difference 
between the finite-dimensional space R” and the infinite-dimensional 
vector space of real-valued continuous functions on [0, 1]. 


0.1.3 Subspaces and span. A subspace U of a vector space V is a sub- 
set of V that is, by itself, a vector space over the same scalar field. For 
example, {[a, b,0]’: a, b eR} is a subspace of R°. Usually a subspace of 
a vector space V is defined by some relation that identifies particular ele- 
ments of V in such a way that the resulting set is closed under the addi- 
tion in V - for example, the elements of R? with last component 0. It is 
in this regard that it is useful to think of the resulting set as a subspace 
rather than as a vector space in its own right. In any event, the intersec- 
tion of two subspaces is again a subspace. 

If S is a subset of a vector space V, the span of S is the set Span S= 
{aV aV +--+ +a,0g: a,,...,4,EF, v1,...,0, ES, k=l, 2,...}. Notice 
that Span S is always a subspace even if S$ is not a subspace. The set S is 
said to span the vector space V if Span S=V. 


0.1 Vector spaces 3 


0.1.4 Linear dependence and independence. A set of vectors [x), X2, 
.,Xz} in a vector space is said to be linearly dependent if there exist 
coefficients @;,...,@, not all 0, in the underlying scalar field F such that 


AX +X + + FAX, =O 


Equivalently, one of the x; terms is a linear combination, with coefficients 
from F, of the others. For example {[1,2 317, [1,0, — 1]7, [2,2, 217} 
is a linearly dependent set in R?. A subset of V that is not linearly 
dependent over F is said to be linearly independent. For example, 
{{1, 2, 317, [1, 0, —1]7 } is a linearly independent set in R?. It is important 
to note that both concepts intrinsically pertain to sets of vectors. Any 
subset of a linearly independent set is linearly independent; {0} is a lin- 
early dependent set; and hence any set which includes the 0 vector is 
linearly dependent. It can happen that a set of vectors is linearly depen- 
dent, while any proper subset of it is linearly independent. 


0.1.5 Basis. A subset S of a vector space V is said to span V if every 
element of V may be represented as a linear combination (with coeffi- 
cients from the underlying scalar field) of elements of S. For example, 
{{1, 0, 0]7 , (0,1, 0]7 ; (0, 0, 1]7 , (1,0, —1]" } spans R? over R (or C? over 
C). A linearly independent set which spans a vector space V is called a 
basis for V. Bases are highly nonunique, but are very efficient in that each 
element of V can be represented in terms of the basis in one and only one 
way, and this is no longer true if any element whatsoever is appended to 
or deleted from the basis. An independent set in V is a basis of V if and 
only if no set which properly contains it is independent. A set that spans 
V is a basis for V if and only if no proper subset of it still spans V. Every 
vector space has a basis. 


0.1.6 Extension to a basis. Any linearly independent set in a vector 
space V may be extended to a basis of V; that is, given a linearly indepen- 
dent set {x}, X2,...,X,;} in V, there exist additional vectors X41. Xn. € 
V such that {x1,...,X,,---} is a basis of V. The extension of a given inde- 
pendent set to a basis is, of course, not unique [for example, any vector 
with nonzero third component may be appended to the independent set 
{[1,0,0]7 , [0, 1,0]7} to produce a basis of R°]. The example of the real 
vector space C[0, 1] of real-valued continuous functions on [0,1] shows 
that a basis need not, in general, be finite; the infinite set of monomials 
(1,x, x7, x, ...} is an independent set in C[0, 1]. 


0.1.7 Dimension. If some basis of the vector space V consists of 
a finite number of elements, then all bases have the same number of 


4 Review and miscellanea 


elements; this common number is called the dimension of the vector space 
V, and is denoted by dim V. In this event, V is said to be finite-dimensional; 
otherwise V is said to be infinite-dimensional. In the infinite-dimensional 
case (e.g., C[0, 1]), there is a one-to-one correspondence between the ele- 
ments of any two bases. The real vector space R” has dimension n. The vec- 
tor space C” has dimension n over the field C but has dimension 2n over 
the field R. The basis {e,,e2,...,e,} in which e; has a 1 as its ith compo- 
nent and 0’s elsewhere is sometimes called the standard basis of R” or C”. 


0.1.8 Isomorphism. If U and V are vector spaces over the same scalar 
field F, and if f: U>V is an invertible function such that f(ax+by)= 
af(x)+bf(y) for all x, ye U and all a,beF, then fis said to be an iso- 
morphism and U and V are said to be isomorphic (“same-structure”). 
Two finite-dimensional vector spaces over the same field are isomorphic 
if and only if they have the same dimension; thus, any n-dimensional 
vector space over the field F is isomorphic to F”. Any n-dimensional real 
vector space is, therefore, isomorphic to R”, and any n-dimensional com- 
plex vector space is isomorphic to C”. Specifically, if V is an n-dimen- 
sional vector space over a field F with specified basis @ = {[x1,..-,X,}, then, 
since any element xe V may be written uniquely as x = ax; +-+++a,Xp, 
a;é¥, i=1,...,n, we may associate x with the n-tuple [x]g =[a,..., ay)", 
relative to the basis ®. The mapping x > [x]@ is an isomorphism between 
V and F” for any basis @. 


0.2 Matrices 


The fundamental object of study here may be thought of in two impor- 
tant ways: as a rectangular array of scalars and as a linear transforma- 
tion between two vector spaces, given specified bases for each space. 


0.2.1 Rectangular arrays. A matrix is an m-by-n array of scalars from 
a field F. If m=n, the matrix is said to be square. The set of all m-by-n 
matrices over F is denoted by M,,,,(F), and M,,,,(F) is abbreviated to 
M,(F). In the most common case in which F = C, the complex numbers, 
M,,(C) is further abbreviated to Mn, and Mm, ,(C) to Mn a. Matrices are 
usually denoted by capital letters. For example, if 


3 

A= 2 -5 0 

-l mr 4 
then A E€ M2,3(R). A submatrix of a given matrix is a rectangular array 
lying in specified subsets of the rows and columns of a given matrix. 


0.2 Matrices 5 


For example [r 4] is a submatrix (lying in row 2 and columns 2 and 3) of 
A, above. 


0.2.2 Linear transformations. Let U be an n-dimensional vector space 
and V be an m-dimensional vector space over the same scalar field F; let 
Gy bea basis of U and @p bea basis of V. We may use the isomorphisms 
x—[x]ey and y—[y]@, to represent vectors in U and V as n-tuples and 
m-tuples over F, respectively. A linear transformation is a function 
T: UV such that Tlax + @2x2) = a, T(x) + a T(x) for arbitrary scalars 
a, and az and vectors x; and x3. A matrix A € M,, ,(F) corresponds to a 
linear transformation 7:U-V in the following way: The vector 
y =T(x) if and only if [y]a,=A[x]@,. The matrix A is said to represent 
the linear transformation T (relative to the bases Gy and @,); the repre- 
senting matrix A depends upon the bases chosen. When we study the 
matrix A, we realize we are studying a linear transformation relative to a 
particular choice of bases, but explicit appeal to the bases is usually not 
necessary. 


0.2.3 Vector spaces associated with a given matrix or linear transfor- 
mation. There is no loss of generality in associating an n-dimensional 
vector space over F with F”, and we shall think of AeM,, ,(F) as a 
linear transformation from F” to F” (and also as an array). The domain 
of such a linear transformation is F”; its range is {y eF”: y= Ax for 
some xe F"}. The null space of A is {x e F”: Ax =0}. The range of Aisa 
subspace of E”, and the null space of A is a subspace of F”. We have 
the relation 


n= dimension of null space of A+dimension of the range of A 


between these two subspaces. 


0.2.4 Matrix operations. Matrix addition is defined entry-wise for 
arrays of the same dimensions and is denoted by + (“A+ 8”). It corre- 
sponds to addition of linear transformations (relative to the same basis), 
and it inherits commutativity and associativity from the scalar field. The 
zero matrix (all entries zero) is the identity under addition, and M,,, ,(F) 
is itself a vector space over F. Matrix multiplication is defined in the 
usual way, is denoted by juxtaposition, AB, and corresponds to the com- 
position of linear transformations. As such, it is defined only when 
AéEM,,,,(F), BE M, (F), and p=n; it is associative. It is not, in gen- 
eral, commutative, for example, 


6 Review and miscellanea 


1 O}f1 2 Pi 1 21/1 0 
0 2113 4 3 llc | 
but it can be commutative when restricted to certain subsets of M,(F), 


which are worthy of study. There is an identity under matrix multiplica- 
tion, the matrix Ze M,,(F) of the form 


This matrix and all scalar multiples of it (called scalar matrices) commute 
with all other matrices in /,,(F) and are the only matrices which do so. 
Matrix multiplication is distributive over matrix addition. 

We note here that the symbol 0 is used throughout to denote each of 
the following: the zero scalar, the zero vector (all components equal to 
the zero scalar), and the zero matrix (all entries equal to the zero scalar). 
Generally, the context will make clear which it is, so that confusion need 
not result. We also use the symbol 7 to denote the identity matrix of any 
size. If there is potential for confusion, the dimension will be indicated. 


0.2.5 The transpose and Hermitian adjoint. If A=[a;;]€M,,,,(F), 
the transpose of A, denoted A’, is that matrix in M,,,(F) whose entries 
are a;;; that is, rows are exchanged for columns and vice versa. For 


example, 
12 377 _ s 
4 5 ‘| 7 
3 6 


Of course, (A')! = A. The Hermitian adjoint A* of A € Mm, „(C) is defined 
by A*=A’, where A is the component-wise conjugate. For example, 


I+i 2-i]* [1-i -3 

-3 -2i} |2+i 2i | 
Both the transpose and the Hermitian adjoint [and the inverse to be dis- 
cussed in (0.5)] obey the reverse-order law: [AB]* = B*A* and (AB)! = 
B’ A’, assuming the product is defined. For the conjugate of a product, 
there is no reversing: AB = AB. If x, ye M, =C”, then y*x is a scalar, 


and its Hermitian adjoint and complex conjugate are the same; thus, 
(y*x)*= yx =xty = y"X, 


0.3 Determinants 7 


0.2.6  Metamechanics of matrix multiplication. We mention here some 
simple features of matrix multiplication that are useful again and again. 


1. If b; denotes the jth column of the matrix B, then the jth column 
of the product AB is just Ab,. 

2. If a; denotes the ith row of the matrix A, then the ith row of the 
product AB is just a, B. 


To paraphrase, in the product AB, left multiplication by A multiplies the 
columns of B and right multiplication by B multiplies the rows of A. Fur- 
ther observations of this type, when one of the factors is diagonal, are 
made in (0.9.1). 


3. If AeM,,,(F) and xeF", then Ax is a linear combination of 
the columns of A (the coordinates of x are the coefficients). 

4. If AE Mm.,(F) and ye F”, then y'A is a linear combination of 
the rows of A (the coordinates of y are the coefficients). 


0.3 Determinants 


Often in mathematics it is useful to summarize a multivariate phenom- 
enon with a single number, and the determinant is an example of this. It 
is defined only for square matrices A e M,,(F), and it may be presented in 
two important, apparently different, but equivalent ways. We denote the 
determinant of Ae M,,(F) by det A. 


0.3.1 Laplace expansion. The determinant may be defined induc- 
tively for A = [a;;] Ee M,(F) in the following way. Assume the deter- 
minant is defined over M„—ı(F) and let Aj;¢M,~)(F) denote the sub- 
matrix of Ae M,(F) resulting from the deletion of row / and column 
j. Then 


n n 
(-1)'*/a;; det A,; = 5 (=1) t a;i; det A;; 
j=l j=l 
for all i<n, j< n, and this common value is det A. The left-hand side is 
the Laplace expansion by minors along row i, and the right-hand side is 
the Laplace expansion along column j [see (0.7.1)]. For any choice of 
row or column, either expansion yields the determinant. This inductive 
presentation begins by defining the determinant of a 1-by-1 matrix to be 


the value of the single entry. Thus, 


8 Review and miscellanea 
det[ai;] = ay 
ai di2 
det| 
az, ay 
ai, di2 = ay3 
det A Az a3 


i 


Gy, 422 — Qj2 a1 


ll 


A) 1 822433 + 1247303; + A134) A32 


43; 432 33 — Ay | 823832 — Ay2 Az a33 — A, 32743, 


and so on. It is also clear that det A’ =det A and that det A* = det A if 
AEM, (C). 


0.3.2 Alternating sum. As motivated by the low-dimensional examples 
above, we also have for A = [a;;]}eM,,(F) that 


n 
det A = È sgn o [] aisi 
o i=] 


where the sum runs over all n! permutations ø of the n items {1,..., n} 
and the “sign” or “signum” of a permutation ø, sgno, is +1 or —1, ac- 
cording to whether the minimum number of transpositions, or pair-wise 
interchanges, necessary to achieve it starting from {1,2,...,} is even or 
odd. Thus, each product 


@16(1) 420(2)°** Ano(n) 


enters into the determinant with a + sign if the permutation ø is even or a 
— sign if it is odd. 

If the coefficient sgn ø is replaced by certain other functions, the so- 
called generalized matrix functions result in place of det A. One example 
is per A, the permanent of A, in which sgn is replaced by the function 
which is identically 1. 


0.3.3 Elementary operations. Three simple and fundamental opera- 
tions may be used to put any matrix in a simple, unique (i.e., canonical) 
form appropriate to it for purposes such as solving linear equations, cal- 
culating determinants, matrix inversion, and determination of rank. Con- 
centrating upon rows, they are as follows. 


Type 1: Interchange of two rows 


The interchange of rows ¿į and j may be effected via left multiplication by 
the matrix 


0.3 Determinants 9 


Cc i ' 
l | | 
I t 
| } 
i l 
1 | 
-=== 0 --~---~----- | ------ | ---rowi 
i 1 i 
t 
i 
lj 
I 1 t 
=-=7=== | ---------- Q ------ --- row j 
1 
L -l 
column i column j 


in which the two off-diagonal 1’s are in the į, j and j, i positions and all 
unspecified entries are 0. 


Type 2: Multiplication of a row by a nonzero scalar 


Multiplication of row i of A by the scalar c may be accomplished via left 
multiplication by the matrix 


[ 1 


row I 


column ij 


in which the scalar c occurs in the /,/ position. 


Type 3: Addition of a scalar multiple of one row to another row 


Addition of c times row ij to row j results from left multiplication of A by 
the matrix 


10 Review and miscellanea 


-=-= C ----- row j 


column i 


in which the scalar c occurs in the /, / position. Note that the matrices of 
each of the three elementary operations are just the result of the opera- 
tion applied to the identity matrix J. 

The effect of a type 1 operation upon the determinant is to multiply it 
by —1; the effect of a type 2 operation is to multiply it by the scalar c; a 
type 3 elementary operation does not change the determinant. It follows 
that a matrix with a zero row, or with two dependent rows, or with any k 
rows dependent, has determinant zero. A matrix has determinant zero if 
and only if a subset of its rows is linearly dependent. 


0.3.4 Row-reduced echelon form. To each Ace M,, ,(F) there corre- 
sponds a canonical form in M,,,(F), the row-reduced echelon form 
(RREF) of A, which may be attained by a (nonunique) sequence of ele- 
mentary operations. Many matrices have the same RREF, but each 
matrix has only one RREF regardless of the sequence of elementary 
operations used to attain it. The defining specifications of the RREF 
are: 


(a) Each nonzero row has | as its first nonzero entry; 

(b) All other entries in the column of such a leading 1 are equal to 0; 

(c) Any rows consisting entirely of zeroes occur at the bottom of the 
matrix; and 

(d) The leading 1’s occur in a “stairstep” pattern, left to right; that 
is, a leading 1 in a lower row must occur to the right of its coun- 
terpart above it. 


For example, 


0 1-1 0 0 2 
00 01 0a 
00 001 4 
00 00 0 0 


0.3 Determinants 11 


is in RREF. The determinant of A ¢M,(F) is nonzero if and only if its 
RREF is the identity 


(whose determinant is 1). The value of det A may be calculated by record- 
ing the effects upon the determinant of each of the elementary operations 
which lead to the RREF. 

For the system of linear equations Ax=b with AeM,,,,(F) and 
beF” given and xe F” unknown, the set of solutions is unchanged by 
elementary operations performed consistently upon A and b. The solu- 
tions may be read off from the RREF of [A b]. In fact, the RREF is 
unique, and two systems Ax =b are solution-equivalent (have the same 
set of solutions) if and only if the two augmented matrices [A b] have the 
same RREF. 

We shall discuss the role of the RREF in rank and inversion later. 


0.3.5 Multiplicativity. The most crucial and important property of the 
determinant function is that it is multiplicative: For A, Be M,,(F) 


det AB = det A det B 


This may be proved using elementary operations which row-reduce both 
A and B. 


0.3.6 Functional characterization of the determinant. Thought of as a 
function of each row (or column) separately with the others fixed, the 
determinant is a linear function of the entries in the given row (or 
column). This is clear from the Laplace expansion: The coefficient of a 
given entry is just + the complementary minor, which is fixed. A func- 
tion that is linear with respect, in turn, to each of the subsets of a given 
partition of its arguments is called multilinear. This is a rather broad 
class. For example, the function f(x), x2) = X;x2 is multilinear, with the 
partition being {xı}, {x2}. Thus, the determinant is multilinear as a func- 
tion of the entries of a matrix, with the partition corresponding to the 
rows (or columns). i 

It is natural to ask if any subset of the properties of the determinant 
noted thus far characterize it as a scalar-valued function of the n? entries 
of Ae M,. The determinant is the unique function f: M,(F)—F that is 


12 Review and miscellanea 


(a) Multilinear; 
(b) Alternating: the type 1 operation multiplies the result by —1; and 
(c) Normalized: f(T) =1, where Je M,,(F) is the identity matrix. 


The permanent function is also multilinear (as are other generalized 
matrix functions) and it is normalized, but it is not alternating. 


0.4 Rank 


An important nonnegative integer associated with each matrix A€ 
M m,n(F) is its rank, which we denote by rank A. 


0.4.1 Definition. If Ae M,, ,(F), rank A is the largest number of col- 
umns of A that constitute a linearly independent set. This set of columns 
is not, of course, unique, but the cardinality [number of elements] of this 
set is unique. It is a remarkable fact that rank AT = rank A. Therefore, 
rank may equivalently be defined in terms of linearly independent rows. 
Often this is phrased as “row rank = column rank.” 


0.4.2 Rank and linear systems. The linear system Ax = b (0.3.4) may 
have 0, 1, or infinitely many solutions, but these are the only possibilities. 
If there is at least one solution, the system is said to be consistent. The 
linear system is consistent if and only if rank [A b] =rank A. The m-by- 
(n+1) matrix [A b] is called the augmented matrix, and to say that the 
augmented matrix and the coefficient matrix A have the same rank is just 
to say that b is a linear combination of the columns of A. In this case, 
appending b to the columns of A does not increase the rank. A solution 
of the linear system Ax = b is a vector x of coefficients which give bas a 
linear combination of the columns of A. 


0.4.3. RREF and rank. Elementary operations do not change the rank 
of a matrix, and thus rank A is the same as the rank of the RREF of A, 
which is just the number of nonzero rows in the RREF. Calculation of 
the rank by calculation of the RREF suffers from ill-conditioning: 
Round-off errors in intermediate numerical calculations can make zero 
rows of the RREF appear to be nonzero, thereby affecting perception of 
the rank. 


0.4.4 Characterizations of rank. The following statements about a 
given matrix Ae M,, ,(F) are all equivalent; each can be useful in a 
different context. 


0.4 Rank 13 


(a) rank A=k; 

(b) There exist k, and no more than k, rows of A that constitute a 
linearly independent set; 

(c) There exist k, and no more than k, columns of A that consti- 
tute a linearly independent set; 

(d) There is a k-by-k submatrix of A with nonzero determinant, but 
all (k+1)-by-(4 +1) submatrices of A have determinant 0; 

(e) The dimension of the range of A is k; 

(f) There is a set of k, but no more than k, linearly independent 
vectors b such that the linear system Ax = b is consistent; and 

(g) k=n -the dimension of the null space of A. 


0.4.5 Rank inequalities. Some fundamental inequalities involving the 
rank are the following. 


(a) For AeM,, ,(F), rank A smin{m,n}. 

(b) If rows and/or columns are deleted from a matrix, the rank of 
the resulting submatrix cannot be greater than the rank of the 
original matrix. 

(c) If AeM,,,(F) and Be M,,(F), 


(rank A+rank B)— k srank AB < min {rank A, rank B} 
(d) If A,BeM,,,,(F), rank(4+B)s rank A+ rank B. 


Somewhat more subtle is an inequality of Frobenius, from which others 
follow: 


(e) IfAeMm (E), BEM, p(E), and Ce Mp,n(F), then 
rank AB+rank BC < rank B+rank ABC. 


0.4.6 Rank equalities. 


(a) If AeM,,,(C), rank A*=rank A’ = rank A=rank A. 

(b) If AeM,,(F) and Ce M,(F) are nonsingular and Be M,,,,(F), 
then rank AB = rank B = rank BC = rank ABC; that is, rank is 
unchanged upon left or right multiplication by a nonsingular 
matrix. 

(c) If A,BeEM,,,,(F), then rank A = rank B if and only if there exist 
nonsingular X e M,,(F) and Ye M,,(F) such that B= XAY. 

(d) If AEM,, ,(C), rank A*A =rank A. 

(e) If Ae M,, ,(F) has rank k, then 


A=XBY 


14 Review and miscellanea 


where XeM,m (E), YeMy,,(F), and BeM,(F) is nonsin- 
gular. In particular, a matrix A that has rank 1 may always be 
written in the form A=xy! for some xe F”, ye F”. 


0.5 Nonsingularity 


A linear transformation or matrix is said to be nonsingular if it pro- 
duces the output 0 only for the input 0. Otherwise, it is singular. If 
A€é Mm,,(F) and m<n, then A is necessarily singular. If Ae M,(F), A 
is called invertible if there is a matrix A~'e M,,(F) called the inverse of A 
such that A~!A =Z. Equivalently, A is invertible if the linear transforma- 
tion A is one-to-one, and its inverse transformation (also linear) exists. If 
AeM, and A'A =I, then AA7!=J; A`! is unique whenever it exists. 
It is useful to be able to recognize in several different ways whether 
AéM,(F) is nonsingular. The following are equivalent if Ae M,(F): 


(a) A is nonsingular; 

(b) A! exists; 

(c) rankA=n; 

(d) The rows of A are linearly independent; 

(e) The columns of A are linearly independent; 
(£) detA #0; 

(g) The dimension of the range of A is 7; 

(h) The dimension of the null space of A is 0; 

(i) Ax=bis consistent for each be F”; 

(j) If Ax=bd is consistent, then the solution is unique; 
(k) Ax=b has a unique solution for each be F”; 
(1) The only solution to Ax =0 is x =0; and 
(m) 0 is not an eigenvalue of A (see Chapter 1). 


The nonsingular matrices in M,(F) form a group, the general linear 
group, often denoted GL(n, F). 


0.6 The usual inner product 


We adopt the convention of considering elements of F” to be column 
vectors [i.e., F” = M, \(F)]. Thus if xeC", x’ and x* are row vectors. 
Note that if xe R”, x*=x'. 


0.6.1 Definition. The scalar y*x is an inner product (scalar product) 
of x and ye C” and is often denoted <x, y)= y*x. Since it is possible to 
define inner products other than this one, we refer to this one as the 
usual or standard inner product on the vector space C”. Note that (e, +) 


0.6 The usual inner product 15 


is linear in the first argument (ax; +8X2, Y)= aX, V+ 8x2, Y) for 
all a,8e€C and x;,x,¢C") and is conjugate linear in the second 
(Cx, ayi + BY2) = KX, y) + BCX, y2) for all a, BEC and y, y2€ C”). 


0.6.2 Orthogonality. Two vectors x, ye C” are called orthogonal if 
<y,x)=0. In two and three dimensions, this has the conventional geo- 
metric interpretation of perpendicular. A set of vectors {X),...,X%}C C” 
is said to be orthogonal if each pair of vectors in the set is orthogonal. A 
set of orthogonal vectors, none of which is the zero vector, is necessarily 
linearly independent. 


0.6.3 The Cauchy-Schwarz inequality. If xeC”, the nonnegative 
scalar <x, x)" is the Euclidean length of x. A vector whose Euclidean 
length is 1 is said to be normalized (or, sometimes, a unit vector). For 
any nonzero vector x € C”, x/¢x, x)'/* is a normalized vector which points 
in the same direction as x. The fundamental Cauchy-Schwarz inequality 
states that 


Ky, x)| < <x, xy? ¢y, yl? 


for all x, y e C” with equality if and only if x and y are dependent. Gen- 
eralizing the notion of orthogonality, the angle 0 between two nonzero 
vectors x, y e C” may be defined unambiguously by 

Ky, x)| T 


sis 


0 
ix, x) y, yy 2 


cos § = 


0.6.4 Gram-Schmidt orthonormalization. It is intuitively plausible 
that a set of linearly independent vectors (which form a basis for their 
span) may be replaced by an orthonormal (orthogonal and individually 
normalized) basis for the same space. Although this replacement may, in 
principle, be carried out in infinitely many ways, there is a very simple 
and far-reaching algorithm for carrying out this replacement: the Gram- 
Schmidt (orthonormalization) process. Let {x;,..-,X,} be a set of n lin- 
early independent vectors in a complex vector space and let {z),..-, Zn} be 
the orthogonal set of normalized vectors to be determined. The z; may be 
calculated in turn as follows. Let yı =x, and choose 


Yı 


iy, yp? 


so that z; is normalized. Let yz = x2—(x2, Z1)Z1, SO that y2 is orthogonal to 
zı, and choose 


Zi 


16 Review and miscellanea 
a= Y2 
2 ponent 
(V2, Y 


so that z, is normalized and is orthogonal to zı. The process continues 
similarly. Assuming Z4, ..., 2%, have been determined, let 


Vk = Xk ~ Xk, Zk-1)Zk-1— (Xk, Zk-2)Zk-2— 1 — (Xk, ZZ 
so that yx is orthogonal to Z,...,Z,—1, and again normalize y, to get 


a I 
Yes Ve? 


Continue until the desired orthonormal vectors Z4, ..., Z, have been pro- 
duced. Note that an infinite orthonormal set could be produced from a 
countably infinite linearly independent set in an infinite-dimensional vector 
space in this way. 

At each step in the Gram-Schmidt process, the orthonormal vectors 
Z1,-++)Z,% are a linear combination of the original independent vectors 
Xis., Xg Only (and vice versa). If we denote Z= [zi z2 + z,] and X= 
[x1 X2 -++ Xn], matrices that have as columns the vectors z; and x;, respec- 
tively, then Z = XR, where the matrix R=[r; j] is nonsingular and upper 
triangular; that is, r;;=0 whenever i>. 

Finally, we note that the Gram-Schmidt process may be applied to 
any finite or countable (not necessarily linearly independent) sequence of 
vectors. If the set is not independent, it will produce a vector y, =0 for 
the least value of k for which {x,,...,x,;]} is a linearly dependent set. In 
this case, x, is a linear combination of x;,...,x,%—,. Substitution of x, 4 
for x, and continuation of the Gram-Schmidt process can answer such 
questions as: What is a basis for, or the dimension of, the span of 
{X1, 206, Xy]? 


Zk 


0.6.5 Orthonormal bases. An orthonormal set of vectors is just an 
orthogonal set of vectors, each of which is normalized. Such a set cannot 
contain the vector 0 and is necessarily linearly independent. An ortho- 
normal basis is a basis whose vectors constitute an orthonormal set. 
Since any basis may be transformed to an orthonormal basis (via Gram- 
Schmidt), any finite-dimensional complex vector space has an ortho- 
normal basis. Such a basis is pleasant to work with, since the cross-terms 
in inner product calculations all vanish. 


0.6.6 Orthogonal complements. Given any subset SC C”, the orthog- 
onal complement of S is the set 


S*={xeC";x*y=0 forall yes} 


0.7 Partitioned matrices 17 


Even if S is not a subspace, S* is always a subspace. We have (S$*)*= 
Span S, and (S*)+=S if S is a subspace. It is always the case that 
dim S*+dim($*)+=n. In the context of the linear system Ax =b, 
AEMm,n, it is worth noting that the range of A is just the orthogonal 
complement of the null space of A*; that is, Ax =b has a solution (not 
necessarily unique) if and only if b*z =0 for all z e C” such that A*z =0. 


0.7 Partitioned matrices 


Analogous to a partition of a set, a partition of a matrix is an exhaustive 
decomposition of the matrix into mutually exclusive submatrices such 
that each entry of the original matrix falls in one and only one submatrix 
of the partition. Partitioning of matrices is often a convenient device for 
perception of useful structure. 


0.7.1 Submatrices. Let Ae M,,, ,(F). For index sets œ S fl, ..., m} and 
Baf, ..., n}, we denote the (sub)matrix that lies in the rows of A indexed 
by a and the columns indexed by 6 as A(a, 8). For example, 


1 2 3 


12 3 
45 6 (13, 123D=] 5 ; 5| 
7 8 9 


If m=nand 8 =a, the submatrix A(a, aœ) is called a principal submatrix 
of A and is abbreviated A(a). Often it is convenient to indicate a sub- 
matrix or principal submatrix via deletion, rather than inclusion, of rows 
or columns. This may be accomplished by complementing the index sets. 
For example, A(a’, 8’) is the result of deleting the rows indicated by a 
and the columns indicated by 8. 

The determinant of a square submatrix of the matrix A is called a 
minor of A. If the submatrix is a principal submatrix, then the minor is a 
principal minor. A signed minor, such as those appearing in the Laplace 
expansion (0.3.1) [(—1)'*/ det A;;] is called a cofactor of A. By conven- 
tion, the empty principal minor is 1; that is, det A(¢) = 1. 


0.7.2 Partitions and multiplication. If a,,..., a, constitute a partition 
of {l,...,} and 8),...,8, constitute a partition of {1,..., 7n}, then the 
matrices A(a;,8;) form a partition of the matrix Ae M,,,,(F), 1<ist, 
lsj<s.If AEM,,,(F) and Be M,, ,(F) are partitioned so that the two 
partitions of {1,..., 7} coincide, the two matrix partitions are said to be 
conformal. In this event, 


[AB] (any) = X Ala, Bx) BOB) 


18 Review and miscellanea 


where A(a;, Bx) and B(Bk, y;) are conformal partitions of A and B. The 
left-hand side is a submatrix of the product AB (calculated in the usual 
way), and each summand on the right is a standard matrix product. 
Thus, multiplication of conformally partitioned matrices mimics usual 
matrix multiplication. Addition of partitioned matrices also makes sense 
when the summands are partitioned identically. 


0.7.3. The inverse of a partitioned matrix. It is sometimes useful to 
know the corresponding blocks in the inverse of a partitioned, nonsin- 
gular matrix A, that is, to present the inverse of a partitioned matrix in 
correspondingly partitioned form. This may be done in a variety of 
apparently different, but equivalent, ways - assuming that certain sub- 
matrices of A e M,(F) and A` l are also nonsingular. For simplicity, let A 
be partitioned as 


A 
Az | i 42 | 
Ay, Az 


with An e M,,(F), i=1,2and n+ m =n. A petal expression for the cor- 
respondingly partitioned presentation of A~ 


areas An vita A 
[Ap An An An] 'Ar An [A22— An An Aral | 


assuming that all the relevant inverses exist. Or, in general index set 
notation, we may write 


A7'(a) = [A(a) —A(a, a) Ala) Ala, a] 
and 
A~a, a’) =Ala) Ala, a) [A(a’, a) Ala) Ala, oe’) Ala) 


again assuming that the relevant inverses exist. Other presentations are 
possible. Note that A`! (&œ) is a submatrix of A~’, while A(a)~! is the 
inverse of a submatrix of A, and these two objects are not, in general, the 
same. 


0.7.4 The inverse of a small-rank adjustment. If the inverse of a given 
matrix is known, it is also often useful to know how the inverse changes 
upon addition of a matrix of “small” rank. There is a simple formula 
that, if the form of the adjustment matrix is sufficiently simple, can 
make the new inverse simpler to compute than starting from scratch. 
Suppose a nonsingular matrix A € M,,(F) has a known inverse A`! and 
consider 


0.8 Determinants again 19 
B=A+xXRY 


where X is n-by-r, Y is r-by-n, and R is r-by-r and nonsingular. If B is 
nonsingular, then 


B'=A7'~A'NX(R'4YA'X)'YAT! 


If ris much smaller than n, then R and R~'+ YA~'X may be much easier 
to invert than B, and if A is easy to invert and has a form that renders the 
multiplications simple, then use of this formula may be competitive with 
direct inversion of B. For example, if the adjustment has rank 1, X is 
n-by-1, Y is l-by-n, and R=[1], the formula becomes 


1 
1+ YANX 


(note that XY = B—AA in this case). In particular, if 


B-'!=aA7!- A'XYA™! 


B=I+xy! 
for x, ye F”", Je M,(F), then 


poe -— T 
if y'x#-1. 


0.8 Determinants again 


Some additional facts about and identities for the determinant are use- 
ful for reference. Most of these are not generally found in elementary 
treatments. 


0.8.1 Compound matrices. The array of all minors of a given size 
from a given matrix A € Mm,„(F) is called a compound matrix of A. 


In particular, the (x) -by-(%) matrix whose a, 8 entry is det A(a, B) is 


called the kth compound matrix of A and is denoted by C(A). Here, 
aCf{l,...,m}and 8E [1, ..., n} are index sets of cardinality k < min{m, n}, 
usually ordered lexicographically, that is, {1,2,4} before (1, 2,5} before 
{1,3,4} and so on. For example, if 


1 2 3 
A=|4 5 6 
7 8 9 


then 


20 Review and miscellanea 


det| 4 z] det[; A det[; a] -3 -6 -3 
C,(A) =| det[} 3] aet[}3] det[Z 5] |=| -6 -12 —6 
deft 5] aef] aef] b53 TE 


If AeM,, (FE) and Be M,,,(F), then 


C,(AB) = C,(A)C,(B), r<min{m, k,n} 


Also, 
C,(tA)=t'C,(A), teF 
If eM, — Cx(1)=1E Mn) 
If AEM, is nonsingular, C,(A)~!=C,(A7') 
If AG Mm. n(F), C(A") = C(A)" 
and 


If AEM n(C), — Cx(A*) = C(A)" 


0.8.2 The classical adjoint and the inverse. If A € M, (F), the trans- 
posed matrix of cofactors B = [b;;] € M, (F) defined by 


bij =(—1)'*/ det ACLS (3) 


is called the (classical) adjoint of A and is often denoted adj A. The term 
adjugate is sometimes used in place of adjoint to avoid confusion with 
the Hermitian adjoint A*. Note that 


adj A=EC,,_\(A)'E 


where 


+1 


Lu. 


that 
(adj A)A = A(adj A) =(det AMH 


Thus, if A is nonsingular (det A #0), 


A calculation using the Laplace expansion for the determinant shows 


0.8 Determinants again 21 
At=——adjA 
T det A“? 


Use of the adjoint is generally not a good way to calculate the inverse of a 
matrix numerically, but the adjoint can be useful as an analytic way to 
present the inverse. 


0.8.3 Cramer’s rule. Cramer’s rule is one method of presenting the 
unique solution to the linear system Ax =b when Ae M,,(F) is nonsin- 
gular. It bears the same computational caveats as the adjoint presenta- 
tion of the inverse, and it is generally useful only when there is a need to 
present analytically a particular component of the solution. If x, is the 
ith component of the solution vector x e E”, then Cramer’s rule states that 


det(A < b) 
X= det A 


The notation A < b denotes that matrix in M,, whose ith column is b and 
whose remaining columns coincide with those of A. Cramer’s rule fol- 
lows directly from the multiplicativity of the determinant. The system 
Ax =b may be rewritten as 


Al ex)=A eb 
and taking determinants of both sides (using multiplicativity) gives 
(det A) det (7 <x) = det(A <b) 


But det(/ <x) =x;, and the formula follows. 


0.8.4 Minors of the inverse. An important fact, which generalizes the 
adjoint formula for the inverse of a nonsingular matrix, and which 
relates the minors of A~! to those of A e M,,(F), is the following: 


~ n det A(G, a) 
det A`! "B')=(-1 QrealtZjegs) 2th 
i (895C det A 
For principal submatrices, this formula assumes the simple form 
- det A(a) 
det A~! (aœ) = = 
(œ det A 


0.8.5 Schur complements and determinantal formulae. Let aS 
{1,...,} be an index set such that A(«) is nonsingular for a given matrix 
Ae M,(F). Denote the inverse of A(a) by A(a)~'. An important formula 
for det A, based upon the 2-partition of A using a and a’, is 


22 Review and miscellanea 

det A = det A(a) det[A(a’)—A(a’, a) A(a)'A(a, a’)] 
Notice that this generalizes the familiar formula for the determinant of a 
2-by-2 matrix displayed in (0.3.1). The special matrix 

A(a’)—-A(a’, a)A(a)'A(a, a’) 


is called the Schur complement of A(a) in A. The Schur complement 
formula for det A may be verified via multiplication of 


Ay, Aj? I -AÑ An 
Ar, An 0 I 


and identification of A; with A(a). Note that the Schur complement has 
already risen in the partitioned form for A`! (0.7.3). 


0.8.6  Sylvester’s identity. Let a S {1,..., n} be a fixed index set and let 
B= [bi;]E€ Mn-k(E) be defined by 


j=det A(aU(i}, eULJ)) 


where k is the cardinality of a, i, je {1,..., n} are indices not contained 
in a, and Ae M,,(F). Another useful determinantal identity is 


det B = [det A(a)]"~*~'det A 


0.8.7 Cauchy-Binet formula. This useful formula can te remembered 
because of its similarity in appearance to the formula for matrix multi- 
plication. This is no accident, since it is equivalent to the multiplicativity 
of the compound matrix (0.8.1). Let Ae Mm (E), BeMy,,(F), and 
C=AB. Further, let 1srsmin{m,k,n}, and let aCf{l,...,m} and 
BC{l,...,”} be index sets, each of cardinality r. An expression for the 
a, 8 minor of C is 


det C(a, 8) = J, det A(a, y) det B(y, B) 
y 


where the sum is taken over all index sets y € {1, ..., K} of cardinality r. 


0.8.8 Relations among minors. Let Ac ™M,,,(F) be given and let a 
fixed index set a G {1, ..., m} of cardinality k be given. The minors 
det A(a, w), 


as wC{l,...,m} runs over ordered index sets of cardinality k, are not 
algebraically independent since there are more minors than there are dis- 
tinct entries among the submatrices. Quadratic relations are known 


0.9 Special types of matrices 23 


among these minors. Let i, /2,...,7,€ {1,..., 2} be k distinct indices, not 
necessarily in natural order, and let 


Ala; iy, sens Iz) 


denote the matrix whose rows are indicated by a and whose jth column 
is column i; of A(a, {1,...,m}). The difference between this and our 
prior notation is that columns may not occur in natural order as in 
A({1, 3}; 4,2), whose first column has the 1, 4 and 3, 4 entries of A. We 
then have the relations 


det A(a3 ij,...,4%) det Ala; Jj,..., 7x) 


k 
= £ det A(&æ; issis- Jo ls+p. esik) det Ala; Jiss Seats iss Stator Sk) 
t=] 


for each s =1,..., k and all sequences of distinct indices 


i,...,4-E€f1,...,n} and s,,...,/,Ef1,..., 7} 


0.9 Special types of matrices 


Certain matrices of special form arise frequently and have important 
properties. Some of these are worth cataloging here for reference and 
terminology. 


0.9.1 Diagonal matrices. The matrix D = [d;;] € M, is called diagonal 
if dj; =O whenever ji. Conventionally, we denote such a matrix as 
D=diag(dy,...,dnn) or D=diagd, where d is the vector of diagonal 
entries of D. If all the diagonal entries of a diagonal matrix are positive 
(nonnegative) real numbers, we refer to it as a positive (nonnegative) 
diagonal matrix. Note that the term positive diagonal matrix means that 
the matrix is diagonal in addition to having positive diagonal entries; it 
does not refer to a general matrix all of whose diagonal entries happen to 
be positive. The identity matrix Ie M, is an example of a positive diag- 
onal matrix. A diagonal matrix DeM, is called a scalar matrix if the 
diagonal entries of D are all equal; that is, D=aJ for some a e C. Left or 
right multiplication of a matrix by a scalar matrix has the same effect as 
multiplying it by the corresponding scalar. 

The determinant of a diagonal matrix is just the product of its diagonal 
entries: det D = []/.., d;;. Thus, a diagonal matrix is nonsingular if and 
only if all its diagonal entries are nonzero. Left multiplication of A € M, 
by a diagonal matrix D, that is, DA, multiplies the rows of A by the 
diagonal entries of D (the ith row of A is multiplied by d;;, i=1,..., 7). 
Right multiplication by D, that is, AD, multiplies the columns of A by 


24 Review and miscellanea 


the diagonal entries of D. Thus, all diagonal matrices commute with each 
other under multiplication, and a diagonal matrix D commutes with a 
given matrix A = [a;;] € M, if and only if a;; = 0 whenever the ith and jth 
diagonal entries of D differ. The product of two diagonal matrices is just 
the diagonal matrix of pair-wise products of their respective diagonal 
entries and similarly for positive integer powers of a single diagonal 
matrix. 


0.9.2 Block diagonal matrices. A matrix A € M, of the form 


in which A; € Mn, i=1,...,k, and Yi, 1; =, is called block diagonal. 
Notationally, such a matrix is often indicated as A = Ajj)®A22.0 + +> OAgy 
or, more briefly, ® E4; Aj;; this is called the direct sum of the matrices 
A.. Apk. Thinking in terms of partitioned multiplication, many prop- 
erties of block diagonal matrices generalize those of diagonal matrices. 
For example, det (® Dfi Ajj) = I, det A;;, so that A = © È Aj; is non- 
singular if and only if each A; is nonsingular, i =1,..., k. Furthermore, 
two direct sums A = @ Xf; A; and B=@ Xf- Bu, in which each pair 
Ai B;; is the same size, commute if and only if A; and B; commute, 
i=1,..., k. Also, rank(® St, Aj) = 5t rank A;;. 


0.9.3 Triangular matrices. The matrix T= [t;;] € M, is said to be upper 
triangular if t;; =0 whenever j <i. If t;; =0 whenever j £ í, then T is said 
to be strictly upper triangular. Analogously, T is said to be lower trian- 
gular (or strictly lower triangular) if its transpose is upper triangular (or 
strictly upper triangular). Triangular matrices are like diagonal matrices 
in that the determinant of a triangular matrix is the product of its diagonal 
entries. However, triangular matrices (of either sort) do not necessarily 
commute with other triangular matrices. Left multiplication of Ae M, 
by a lower triangular matrix L, that is, LA, replaces the ith row of A bya 
linear combination of the first through ith rows of A. Sometimes the 
terms right (in place of upper) and left (in place of lower) are used in 
reference to triangular matrices. The rank of a triangular matrix is at 
least, and can be greater than, the number of nonzero entries on the main 
diagonal. 


0.9 Special types of matrices 25 


0.9.4 Block triangular matrices. A matrix A e M, of the form 


in which Ape Ma, i=1,...,k, Siem, =n, and “*” denotes any entry, 
is called block upper triangular. Block lower triangular, strictly block 
lower triangular, and strictly block upper triangular may be defined sim- 
ilarly. The determinant of a block triangular matrix is the product of the 
determinants of the diagonal blocks. The rank of a block triangular 
matrix is at least, and can be greater than, the sum of the ranks of the 
diagonal blocks. 


0.9.5 Permutation matrices. A matrix Pe M, is called a permutation 
matrix if exactly one entry in each row and column is equal to 1, and all 
other entries are 0. Multiplication by such matrices effects a permutation 
of the rows or columns of the object multiplied. For example, 


0 1 0 
P=|1 0 0/eEmM;, 
00 1 
is a permutation matrix, and 
1 2 
P 2|=|] 
3 3 


. . i 
is a permutation of the rows (components) of the vector | 2 | , Namely, 
3 


the permutation that sends the first item to the second position, sends the 
second item to the first position, and leaves the third item in the third 
position. In general, left multiplication of a matrix A € M m,n by a permu- 
tation matrix P € Mn permutes the rows of A, while right multiplication 
of a matrix Ae Mm,n by a permutation matrix Pe M, permutes the 
columns of A. The matrix that carries out a type 1 elementary operation 
(0.3.3) is an example of a special type of permutation matrix called a 
transposition. 

The determinant of a permutation matrix is +1 [exactly one summand 
in the formula of (0.3.2) is nonzero], so that permutation matrices are 


26 Review and miscellanea 


necessarily nonsingular. Although permutation matrices do not, in gen- 
eral, commute under multiplication, the product of two permutation 
matrices is again a permutation matrix. Since the identity is a permuta- 
tion matrix and P’=P7' for every permutation P, the permutation 
matrices constitute a subgroup of GL(n, C), the group of nonsingular 
matrices in M,,, which has finite cardinality m!. In fact, any permutation 
matrix is a product of transpositions. 

Since PT = P~! permutes columns in the same way that the permuta- 
tion matrix Pe M, permutes rows, the transformation A + PAP" per- 
mutes the rows and columns of A € M, in the same way. In the context of 
linear equations with coefficient matrix A, this transformation amounts 
to renumbering the variables. A matrix A e M, such that PAP’ is trian- 
gular for some permutation matrix P is called essentially triangular. 
These matrices have much in common with triangular matrices. 


0.9.6  Circulant matrices. A matrix A € M, of the form 


rc 


ay Al seeeseess an 
an Q an ->  An-l 

A= Ay —1 dn Gy ay —2 
a2 A, 1. An Q 


is called a circulant matrix. Each row is just the previous row cycled for- 
ward one step, so that the entries in each row are just a cyclic permuta- 
tion of those in the first. The permutation matrix 


fo 10... 0 

. 01 
C=|- 7 0 
0 | 

| 1 0 wee 0 


is called the basic circulant permutation matrix. A matrix A e M, can be 
written in the form 


n=l 
A= Ð ak0“ 
k=0 
if and only if A is a circulant. Here C°=7= C”, and the coefficients 
ai, d2, -.., an are just the entries of the first row of A. Because of this 
representation, circulant matrices have very nice structure, which can be 


0.9 Special types of matrices 27 


related to C. Since C” = J, the product of two circulants is again a circu- 
lant. Furthermore, circulants commute under multiplication. There are 
generalizations in which, for example, the rows are cycled forward (or 
backward) a fixed number of steps that is greater than 1. 


0.9.7 Toeplitz matrices. A matrix A =[a;,]€M,,4, of the form 


do di A a’ ap | 
ad- do A, aa’ n=] 
A=]|dđ232 dı do ay : 
. a, 
| den Any vere aa 


is called a Toeplitz matrix. The general term a;;=a,;~; for some given 
SEQUENCE Aans Anas -+s 15 As A, G2, -+ An—1, 4, E C. The entries of A 
are constant down the diagonals parallel to the main diagonal. The 
Toeplitz matrices 


B= se and F= oe 
0 0 0 1 0 
are called the “backward shift” and “forward shift” because of their 
effect on the elements of the standard basis {e,...,@,4,}. A matrix A € 
M,+ı can be written in the form 
n n 
A= Ý aF*+ X aB" 
k=l k=0 


if and only if A is a Toeplitz matrix. Toeplitz matrices arise naturally in 
problems involving trigonometric moments. 


0.9.8 Hankel matrices. A matrix Ae M,,,,; of the form 


ag A. s.. ay 
di a2 d3 — sase’ An An+] 
A _— a2 a The Qn+2 
ay 
an Onyi GAnt2 + An 


28 Review and miscellanea 


is called a Hankel matrix. The general term aj; =4@;4;~2 for some given 
SEQUENCE Ao, A, 22, ..., A2n—1, A2n. The entries of A are constant along the 
diagonals perpendicular to the main diagonal. Hankel matrices arise nat- 
urally in problems involving power moments. Notice that if 


0 1 


l 0 


the “backward identity” permutation, then PT is a Hankel matrix for any 
Toeplitz matrix T, and PH is a Toeplitz matrix for any Hankel matrix H. 
Since P = PT = P~' and Hankel matrices are symmetric, this means that 
any Toeplitz matrix is a product of two symmetric matrices (P and a 
Hankel matrix). 


0.9.9 Hessenberg matrices. The matrix A = [a;;] € M, is said to be in 
upper Hessenberg form or to be an upper Hessenberg matrix if a;;=0 
fori>yt+l: 


di an wae Qin 
42, 422 
A 0 a3 
0 

0 0 ... O ann- am 


The matrix A € M, is called ower Hessenberg if A’ is upper Hessenberg. 


0.9.10 Tridiagonal matrices. A matrix A = [a;;] € M, that is both upper 
and lower Hessenberg is called tridiagonal, that is, A is tridiagonal if 
a;;=0, whenever |i—j| >1: 


| ayı a2 0 ] 
a2) an 
A= a32 
0 An-n 
Q@nn-1 Ann 


The determinant of a tridiagonal matrix is easy to calculate inductively. 
Note that 


0.9 Special types of matrices 29 
det A({1, 2,...,4+1}) 
= An41,K41 det A(H, ...,K}) — akti kak k+ det A({,...,k-1)), 


0.9.11 Matrices and Lagrange interpolation. A Vandermonde matrix 
AéM,(F) is a matrix of the form 


1 x x? x; Xi 
2 3 n-i 
1 xX. x XZ ae XA 
A=] 7° ° ; ; (0.9.11.1) 
1 x, xp xpi... xf! 


where x1, X2, ...,X, € F; that is, A = [a;;] with aj; =xj/~'. It is a fact that 


det A= [] (x;-x;) (0.9.11.2) 


j=l 
i>] 
so a Vandermonde matrix is nonsingular if and only if the n parameters 
X1,X2,...,X, are distinct. 
The Vandermonde matrix arises in the interpolation problem of find- 
ing a polynomial p(x) =a,—)x"~'+a,9x""* + ++» +a,x+ao of degree 
at most n—1 with coefficients from the field F such that 


P(X1) =o tax, +x + $4, xf 1 =) 

_ 2 n=l oo 
p(X) = Got air taxi ++: + On —1%2 =) (0.9.11.3) 
P(Xn) =A +A X +a Xa+ 2° +n 1X = Yn 
where x1, X2, ..., Xp and yj, Y2, ..., Yn are given elements of F. The inter- 
polation conditions (0.9.11.3) are a system of n equations for the n 
unknown coefficients do, a,...,@,—,, and they have the form Aa = y, 
where a = [dđo, @, vies Qn] € F”, y=[1,2,---, Yp] € F”, and AeM,,(F) 
is the Vandermonde matrix (0.9.11.1). This interpolation problem always 
has a solution if the points x1, X2, ..., X„ are distinct, for A is nonsingular 
in this event. 

If the points x1, X2,...,X„ are distinct, the coefficients of the inter- 
polating polynomial could in principle be obtained by solving the sys- 
tem (0.9.11.3), but it is usually more useful to represent the interpo- 
lating polynomial p(x) in terms of the special Lagrange interpolating 
polynomials 


30 Review and miscellanea 


Il (x —x;) 
At 
L(x) = , §=1,2,...,7 
II -x 
1 
ji 


Each polynomial L;(x) has degree n— 1 and has the property that L;(x;,) = 
Oif ki, but Li(x;)=1. Thus, we have Lagrange’s interpolation formula 


D(X) = Vy Li (X) + YoLa(X) + + + Vy Ln (X) (0.9.11.4) 


for a polynomial p(x) of degree at most n—1 that satisfies the equations 
(0.9.11.3). 


0.10 Change of basis 

Let V be an n-dimensional vector space over the field F, and let @,; = 
{v1}, U2,..., Un} be a basis for V. If xe V is any given vector, then 
there exists some representation of x= a,v;+a2,U2++--+a,v, because 
the set @, spans V. If there were some other representation of x= 
Bivi + 6202+ -++ + Ba Un in the same basis, then 


0 =x—x = (ay — By) vj, + (a2 — Bz) 2+ +++ +(Qn— By) Un 


from which it follows that all a; — 8; = 0 because the set 8; is independent. 
Given the basis ®,, the linear mapping 


ay 


a2 
x>[x]lga 5]. b’ where x= a0) + avt ++ + Ay_Up 


Qn 


from V to F” is well defined, one-to-one, and onto. The scalars a; are 
called the coordinates of x with respect to the basis @,, and the column 
vector [x]q@, is the unique ®, coordinate representation of x. 

Let T: V> V be a given linear transformation. The action of T on any 
xe V is determined once one knows the n vectors Tvi, Tv2, ..., TUn, because 
any xeV has a unique representation x=a,u,+---+a,v, and Tx= 
T (04 0, + e + Ann) = Tla) + +> +T nUn) = Tvt +++ +a,Tu, by 
linearity. Thus, the value of Tx is determined once [x]q@, is known. 

Let ®2={w,, W2,...,W,} be another, possibly different, basis for V, 
and suppose that the ®, coordinate representation of Tv; is 


lij 


[Tujla, = : ’ J=1,2,...,7 


nj 


0.10 Change of basis 31 


Then for any xe V we have 


n 


[Tx]e, = | 2 aj To = > alTejle, 
jz 


Bo i= 
ti; | 
J 

n li = Ln a 

tj . . . 
j= | : 
/ trl tee tan Qn 

nj | 


The n-by-n array [t;;] depends on T and on the choice of the bases @3; 
and ®, but it does not depend on x. We define the @\-®2 basis represen- 
tation of T to be 


typ eee Ly 
el Tle,=| i 7 i | = UTule,--: (Tne! 
ni eee Cnn 


We have just shown that [Tx]e, = @,[T]e,[x]e@, for any xe V. In prac- 
tice, the case @- = B, is the most common one for presenting a basis 
representation of T; 9,[T]«, is called the ® basis representation of T. 

Consider the identity linear transformation J: V > V defined by Ix =x 
for all x. Then 


[xJa, =Uxle, =e, lle, [*]e, = ella, 1a, =e, lH le, e, Hle, [x]le, 


for all xe V. By successively choosing x= Wi, W2,...,W,, this identity 
permits us to identify each column of @,[/]e, 6,[/]a, and shows that 


1 
0 
wl! le, alle, = 0 EA =f 
1 


We commit a common abuse of notation by using J to denote the n-by-n 
identity matrix as well as the identity linear transformation. If we do the 
same calculation starting with [x]e,=[/*]a,=--:, we also find that 


gH le, @,[/1e,=/ 


Thus, the matrix @,[/]@, is the matrix inverse of the matrix @,[/]@,. If we 
write S= g [le then S~! =Q@;[7]e,. Thus, every matrix of the form 
[1 ]a, is invertible. Conversely, every invertible matrix S = [s; $2 +t 5,] € 
M,(F) is of the form ¢,[/]@ for some basis ®. We may take B to be the 
vectors {51, 52,...,5,} defined by [5;]@,=s;, /=1,2,...,n. The set @ is 
independent because S is invertible. 

Notice that 


32 Review and miscellanea 


mlle, = [Ui la,*+ Uenla,) = (leila, [enles] 


SO @,[/]@, expresses the elements of the basis @; in terms of the basis ®>. 
Now let xe V and compute 


@1T]e,[x]e,=(Tx]e, =U(Tx)le, = e, H lg, [Tx]e, 
= 6 [1 ]e, g, [Tle [xla =e l]e, e [T]e Ux]e, 
= @,l/ Je, e [Tle e Hle, [x]e, 


By choosing x = Ww), W2,..., W, successively, we conclude that 


lT le, =e, Hle a, [Tle, «1a, 


This identity shows how the basis representation of T changes if the basis 
used to compute the representation is changed. For this reason, the 
matrix @,[J]@, is called the G; — 8, change of basis matrix. 

Any matrix A € M,,(F) is a basis representation of some linear trans- 
formation T: V >V, for if G is any basis of V, we can determine Tx by 
[Tx] =A[x]g. One computes easily that, for this T, g[T]q@=A. 


CHAPTER 1 


Eigenvalues, eigenvectors, and similarity 


1.0 Introduction 


In this and all the following chapters, we motivate some key issues dis- 
cussed in the chapter with examples of how they arise, either conceptually 
or in application. 


1.0.1 Change of basis and similarity. Every invertible matrix is a 
change-of-basis matrix, and every change-of-basis matrix is invertible 
[see Section (0.10)]. Thus, if @ is a given basis of a vector space V, if T 
is a given linear transformation on V, and if A= @[T]@ is the @ basis 
representation of T, the set of all possible basis representations of T is 


{e[Z]e elT]e el! ]e,:®B is a basis of V} 
={(S~'AS: Se M,,(F) is an invertible matrix} 


This is just the set of all matrices that are similar to the given matrix A. 
Similar but not identical matrices are therefore just different basis repre- 
sentations of a single linear transformation. 

One would expect similar matrices to share many important properties 
- at least, those properties that are intrinsic to the underlying linear 
transformation ~ and this is an important theme in linear algebra. It is 
often useful to step back from a question about a matrix to a question 
about some intrinsic property of the linear transformation of which it is 
only one of many possible representations. 

The notion of similarity is a key concept in this chapter. 


33 


34 Eigenvalues, eigenvectors, and similarity 


1.0.2 Constrained extrema and eigenvalues. A second key concept in 
this chapter is the notion of eigenvector and eigenvalue. We shall see that 
nonzero vectors x such that Ax is a multiple of x play a major role in 
analyzing the structure of a general matrix or linear transformation, but 
such vectors arise in the more elementary context of maximizing (or 
minimizing) a real symmetric quadratic form subject to a geometric 
constraint: 

x’x=1 


Maximize x’Ax, subject to xeR’, 


in which A? = Ae M,(R) is given. A conventional approach to such a 
constrained optimization problem is to introduce the Lagrangian L = 
x’Ax— dx" x. Necessary conditions for an extremum then are 


0=VL=2(Ax—dx) =0 


Thus, if a vector xe R” with x!x=1(and hence x #0) is to be an extre- 
mum of x’Ax, it must necessarily satisfy the equation Ax = \x, and 
hence Ax is a multiple of x. Such a pair à, xis called an eigenvalue, eigen- 
vector pair. 


Problems 


1. Explain why the constrained extremum problem in (1.0.2) must have 
a solution, and conclude that every real symmetric matrix has at least one 
real eigenvalue. Hint: Apply Weierstrass’s theorem (Appendix E) to the 
continuous function f(x) =x! Ax. 


2. Let Ae M,(R) be symmetric (A7 =A). Show that the solution to 
Maximize x/Ax, subject to x'x=1 


is the largest eigenvalue of A. 


1.1 The eigenvalue-eigenvector equation 


1.1.1 Notation. By M,,(F) we denote the n-by-n matrices over a field 
F, usually the real numbers R or the complex numbers C. Most often, the 
facts discussed are valid in the setting of complex-entried matrices, in 
which case M,(C) is abbreviated as M,. For the reader uninterested in 
the generality of complex matrices, it will seldom make a substantial 
difference in the exposition, the algebra, or the facts if the material is 
interpreted in terms of the real numbers. We caution, however, that there 
are major differences between R and C, often having to do with roots of 
polynomials and other flexibility associated with the “larger” complex 


1.1 The eigenvalue-eigenvector equation 35 


field. Often, a real-entried matrix may best be thought of as a complex- 
entried matrix with restricted entries. Recall also that the set (vector 
space) of all real-entried (respectively complex-entried) n vectors is de- 
noted by R” (respectively C”), both interpreted as column vectors. Finally, 
the transpose (0.2.5) of A=[a;;j]—M,(F) is the matrix lajleM,(F), 
denoted by A’, and, if FEC, the Hermitian adjoint is the conjugate 
transpose [@;;] of A, denoted by A’. Similarly, if xe F”, x7 denotes the 
row vector with the same entries as x, and, if FCC, x* denotes the row 
vector whose entries are the complex conjugates of those of x. Here the 
overbar 7 denotes the complex conjugate (see Appendix A) of a complex 
scalar or the component-wise complex conjugate of a vector or matrix. 

A matrix A € M, is thought of as a linear transformation from C” into 
C” (with respect to some given basis of C”), but it is also useful to think 
of it as an array of numbers. It is the interplay between these two con- 
cepts of A, and what the array of numbers tells us about the linear trans- 
formation, that is the essence of matrix theory and a key to applications. 
Perhaps the single most important concept in matrix theory is the set of 7 
numbers o(A) that we associate with A, its eigenvalues. 


Definition. If A € M, and xe C”, we consider the equation 
Ax = hx, (1.1.3) 


where ) is a scalar. If a scalar \ and a nonzero vector x happen to satisfy 
this equation, then A is called an eigenvalue of A and x is called an eigen- 
vector of A associated with \. Notice that the two occur inextricably as a 
pair, and that an eigenvector cannot be the zero vector. 


1.1.2 
x40 


1.1.4 Definition. The set of all >e C that are eigenvalues of Ae M, is 
called the spectrum of A and is denoted by o(A). The spectral radius of 
Ais the nonnegative real number p(A) = max{|A|:\€ 0(A)}. This is just 
the radius of the smallest disc centered at the origin in the complex plane 
that includes all the eigenvalues of A. 


Exercise. If x is an eigenvector associated with the eigenvalue à of A, 
show that any nonzero scalar multiple of x is an eigenvector also. 


Even if they had no other importance, eigenvalues and eigenvectors 
would be interesting algebraically, since, according to (1.1.3), the eigen- 
vectors are just those vectors such that multiplication by A has a very 
simple form - the same as multiplication by a scalar (the eigenvalue). 


36 Eigenvalues, eigenvectors, and similarity 
Example. Consider the matrix 
7 -—2 

A= f 1 | eM, 

Then we have 3 € o(A) with [3 | as an associated eigenvector since 
1 3 1 
A = =3 
lj l]=>l 

Also, 5é€a(A). Find an eigenvector associated with the eigenvalue 5. 


Recall that evaluation of a polynomial 
p(t) =agt*® +ayg_t* b+ ++ +at +a 


at a matrix A e M, is well defined since we may raise a square matrix toa 
positive integral power and may form linear combinations of matrices of 
the same size. Thus, 


p(A) =a, A +a; A +++» +aqAtagl (1.1.5) 


It is useful to observe that a matrix related to A e M, by the action of a 
polynomial has the same eigenvectors as A; its eigenvalues are linked to 
those of A in a simple way. 


1.1.6 Theorem. Let p(-) be a given polynomial. If \ is an eigenvalue 
of AéM,,, while x is an associated eigenvector, then p(\) is an eigen- 
value of the matrix p(A) and x is an eigenvector of p(A) associated with 
PA). 


Proof: Consider p(A)x. First, 
D(A)x= a, A*x +a, 14% 'x+ s+) +a, Ax + aox 


Second, A/x = A/~!Ax = A/~')x =) A/~'x = -+- = \/x by repeated appli- 
cation of the eigenvalue-eigenvector equation. Thus, 


D(A)x =a, Mx + +++ +aox= (aì +- +a9)x=p(r)x. O 


Exercise. If (A) ={—1,2}, Ae M3, what is (A)? 


Exercise. If D = diag(d,, d>,...,d,,) is a diagonal matrix (0.9.1), what is 
o(D)? Give an eigenvector associated with each eigenvalue. Hint: Con- 
sider the standard basis vectors e,, i=1,...,7. 


1.1 The eigenvalue-eigenvector equation 37 
1.1.7 Observation. A matrix A € M, is singular if and only if 0 € 0(A). 


Proof: The matrix A is singular if and only if Ax = 0 for some x 4 0. This 
happens if and only if Ax = Ox for some x #0, that is, if and only if \=0 
is an eigenvalue. O] 


Problems 


1. Suppose that Ae M, is nonsingular. According to (1.1.7), this is 
equivalent to saying that A has no eigenvalues equal to 0. If \e o(A), 
show that \7!€0(A7'). If Ax =x and x #0, give an eigenvector of A`! 
associated with X`}. 


2. If the sum of the entries in each row of Ae M, is 1, show that le 
o(A). Hint: Consider the vector e=[1,1,..., 1]’ and observe that the row 
sums of A are all equal if and only if eis an eigenvector of A. If A is non- 
singular, show that the row sums of A`! are also 1. Given a polynomial 
p(t), show that the row sums of p(A) are all equal. To what? 


3. Let Ae M,(R). If \is areal eigenvalue of A with Ax = àx, 0#x€ C”, 
let x =£+in, where £, ne R” are the entrywise real and imaginary parts 
of x. Show that AE = \£ and Ay = dy; conclude that there is a real eigen- 
vector of A associated with \. Must both £ and ņ be eigenvectors of A? 
Can there be a real eigenvector associated with a complex non-real eigen- 
value of A? 


4. Consider the block diagonal matrix (0.9.2) 


Aj 0 
A= ; AEM 
É an | eM, 
Show that the eigenvalues of A are those of A together with those of 
Ax. Hint: First express the eigenvectors of A in terms of those of A;; and 
Ap). 


5. Ae M, is called idempotent if A? = A. Show that each eigenvalue of 
an idempotent matrix is either 0 or 1. 


6. AEM, is called nilpotent if A? =0 for some positive integer q. The 
minimum such q is called the index of nilpotence. Show that all eigen- 
values of a nilpotent matrix are 0. In the process, give an example of a 
nonzero matrix all of whose eigenvalues are equal to 0. 


7. As we shall see, in the finite-dimensional setting upon which we con- 
centrate, every complex or real square matrix has a complex eigenvalue. 


38 Eigenvalues, eigenvectors, and similarity 


However, it is possible for a linear transformation on an infinite dimen- 
sional vector space to have no eigenvalues whatsoever. Let V be the 
vector space of all formal infinite sequences of complex numbers: 


i=1,2,...} 


and define a linear transformation S on V by 


V={(Q, Q2,..., aky...) 1, EC, 


S(a, ao, ..-) = (0, a, a, m) 


This transformation is sometimes called the shift operator. Verify that S 
is a linear transformation and show that S has no eigenvalues, Hint: 
Show that for a vector to be an eigenvector, all its components would 
have to be the same, but the only possible common value would be 0. The 
proposed vector would therefore have to be the zero vector, which can- 
not be an eigenvector. 


8. A matrix Ae M, is called Hermitian if A* = A (0.2.5). If A is Her- 
mitian, show that all eigenvalues of A are real. Hint: Let \€a(A) be 
arbitrary, and let x be an associated eigenvector. Then (1.1.3) implies 
that x*Ax = )\x*x. But x*Ax =x*A*x = x*Ax, so that x*Ax is real. Since 
x*x is positive, \ = x*Ax/x*x is real also. 


1.2 The characteristic polynomial 


Natural questions to ask about the eigenvalues of A e M, are: How many 
are there? and, How may they be characterized? 

The eigenvalue-eigenvector equation (1.1.3) may be rewritten equiva- 
lently as 


(AIT —A)x =0, x40 (1.2.1) 
Thus, \€ o(A) if and only if \J—A is a singular matrix, that is, 
det(\J—A) =0 (1.2.2) 


1.2.3 Definition. Thought of as a formal polynomial in £, the charac- 
teristic polynomial of A e M, is defined by 


pa(t) =det(ti—A) 


Note: We use ¢ as the formal variable in the characteristic polynomial 
to distinguish it from ^, a generic eigenvalue or zero of the polynomial. 
Elsewhere, the same symbol is sometimes used for both. 


1.2.4 Observation. If A €e M,, the characteristic polynomial p4(+) has 
degree n and the set of roots of p4(t) =0 coincides with o(A). 


1.2 The characteristic polynomial 39 


Proof: That p4(+) has degree n follows inductively from Laplace expan- 
sion of det(t!—A) by minors: each row of t]—A contributes one and 
only one power of ¢ as the determinant is expanded. The second state- 
ment is the equivalence of (1.1.3) and (1.2.2. O 


Exercise. Show that the roots of det(A — tI) = 0 are the same as those of 
det(t7—A) =0 and that det(A — tI) = (—1)" det(t/—A). Thus, the char- 
acteristic polynomial could alternatively be (and sometimes is) defined as 
det(A —tI). Show that the convention we have chosen insures that the 
(leading) coefficient of z” is always +1. 


Exercise. If A= p Ar show that pa(t)=t?—(a+d)t+(ad—bc) and 
that 


(A) = fetta vere) 


If Ae M,(R), show that the eigenvalues of A are real if bc 2 0. Further- 
more, they are real if and only if (@a—d)*+4bc=0. If they are not real, 
they occur as a complex conjugate pair. Finally, show that the eigen- 
values are distinct if (a—d)*+4bc #0. 


In certain general situations the eigenvalues of a matrix are easy to 
perceive. Most often, these are situations in which the determinant is 
easy to calculate due to the form of the matrix. These include diagonal 
and triangular matrices and some other special situations. 


Exercise. Show that if Te M, is triangular, 


ty oes ttn 


fnn 
then o(T) = {t11, t223 ---> tan}, the diagonal entries of T. 


Exercise. Each entry of the matrix J, €e M, is equal to 1: 


What are the eigenvalues of J3? Show that 0 (occurring twice) and 3 are 
the only eigenvalues of J3. What is the analog of this for general n? Hint: 
Consider the vector e=[1,1,...,1]”. 


40 Eigenvalues, eigenvectors, and similarity 


Exercise. Determine all the eigenvalues, and an associated eigenvector 
for each, of the matrix 


3 -1l -l1 
A=| -l 3 =l 
—] -l 3 


Hint: Use the previous exercise, and write A =41—- J. 


1.2.5 Definition. Recall from (0.7.1) that a k-by-k principal submatrix 
of Ae M, is one lying in the same set of k rows and columns and that a 
k-by-k principal minor is the determinant of such a principal submatrix. 


There are (2) different k-by-k principal minors of A=[a;;], and the 
sum of these is denoted by E;,(A). In particular, E (4) = %7~| aj; is 
called the trace of A and is usually denoted by tr A or trace A. Note that 
E,, (A) = det A. 


Exercise. If Ae M, show that py(t)=t?—(trA)t+detA and that 
Yreo(A) N= trA and [hesa A= det A. 


A fundamental and nontrivial fact, called the fundamental theorem of 
algebra (Appendix C), is that a polynomial of degree n with complex 
coefficients has exactly n zeroes, counting multiplicities, among the com- 
plex numbers. In view of this we may make the very important following 
observation. 


1.2.6 Observation. Each matrix AeM, has, among the complex 
numbers, exactly n eigenvalues, counting multiplicities. 

Note: At this point, when we refer to the “multiplicity” of an eigen- 
value } of Ae M,, we simply mean the number of times à occurs as a 
zero of the characteristic polynomial p4(+). A more thorough discussion 
of multiplicity of eigenvalues will come in Section (1.4), but it is useful to 
know that there is a connection between the derivatives of a polynomial 
and the multiplicity of a zero of the polynomial. A polynomial p(f) has A 
as a zero of multiplicity k = 1 if and only if we can write p(z) in the form 
p(t)=(t —)*q(t), where q(t) is a polynomial such that g() #0. Differ- 
entiating this identity gives p’(‘) =k(t—d)*‘~'g(t)+(t-d) q(t), and 
from this representation it is clear that p’(\) =0 if and only if A>1. If 
k>1, p"(t)=k(k-D- \)*~2q(t) + polynomial terms each involving a 
factor ((—d)” with m=k—1, so p’(\) =0 if and only if k>2. Repeti- 
tion of this calculation shows that \ is a zero of p(t) of multiplicity k if 
and only if p(\)=p’(\) = -= =p“ (A) =0 and pA) #0. 


1.2 The characteristic polynomial 4] 


1.2.7 Examples. The statement (1.2.6) depends heavily on the fact that 
the complex field is algebraically closed, that is, each polynomial of 
degree n with coefficients from the field has n zeroes in the field. For 
matrices over other fields, such as the real numbers or rational numbers, 
little in general can be said about how many eigenvalues a matrix has in 
the field. See Problem 8 of Section (1.1) for an example of something 
that can be said, however. Also, in the case of any field, a matrix could 
have very few different eigenvalues. The matrix 


[1 ol 


has no real eigenvalues, though all its entries are real. The matrix 


(1.2.7a) 


1 


(1.2.7b) 


has only one distinct eigenvalue (1 with multiplicity n), regardless of its 
size. 


Exercise. Verify the statements in (1.2.7). 


Exercise. If Ae M,(R) and if nis odd, show that A has at least one real 
eigenvalue. Hint: Recall that any nonreal complex zeroes of a poly- 
nomial with real coefficients must occur in conjugate pairs and note that 
pal.) has real coefficients if Ae M,,(R). 


In view of (1.2.6), we may list the eigenvalues of Ae M, as 
A, A2, tees An 


where the ordering is arbitrary and we repeat eigenvalues according to 
multiplicity. Then, because of (1.2.4), we know that 


PA) = (=M) (=A) E An) (1.2.8) 


1.2.9 Definition. The kth elementary symmetric Junction of the n 
numbers \j;,...,A,, KEN, is 


k 
SkA es An) = 5 I, 


Syse SR y=l 


42 Eigenvalues, eigenvectors, and similarity 


the sum of all (2) k-fold products of distinct items from \j,...,\n- 

For example, Siis -5 An) =M + +A, is the sum of the \; and 
Sais 23 An) = A1 tt An is the product of the A;. Because of (1.2.8) and 
the fact that p4 (t) is defined by a certain determinant, there is a connec- 
tion between the elementary symmetric functions S(^, ---, An) of the 
eigenvalues of a matrix A and the E,(A), the sums of the k-by-k prin- 
cipal minors of A (1.2.5). The following two identities are straightfor- 
ward, if laborious, to verify: 


CONE (t—Ay) = 17 -— SAG, aaa, NE + SoM, cee NE? 
— +2) E SpA «+05 An) (1.2.10) 


and 


pa(t)=t"—E\(A)t" | +E,(A)t" *— +++ tE,(A) (1.2.11) 


Exercise. Convince yourself of (1.2.10) and (1.2.11). The former may be 
verified directly by picking out the coefficient of t”~* in the product 
(t—)1) =- (£—\n), and the latter may be verified inductively by Laplace 
expansion. 


Combining (1.2.10) and (1.2.11) with (1.2.8), we have the following 
theorem. 


1.2.12 Theorem. If \,,...,, are the eigenvalues of Ae M,, then 
Skis 009 An) = EA) 


The kth elementary symmetric function of the eigenvalues of A is the 
sum of the k-by-k principal minors of A. In particular 


n 
tr A= S Ni 
i=l 
and 
n 
det A = Il Ni 
i=l 
Problems 


1. Verify (1.1.7) using (1.2.12). 


2. For matrices AGE My, and BEM, [see (0.2.1)], show by direct 
calculation that tr AB = tr BA. Use this fact to show that for Ae M, and 
nonsingular Se M,, trS~!AS=tr A. The matrix S~'AS is called a simi- 
larity of A, and this result says that the trace is a similarity invariant. 


1.2 The characteristic polynomial 43 


Similarity is the subject of the next section, and we shall find that all the 
principal minor sums £,(A) are similarity invariants. Note that the 
determinant is trivially a similarity invariant because of multiplicativity. 


3. If DeM, is diagonal, compute the characteristic polynomial pp(t) 
and show that pp(D)=0. 


4. Let Ae M,, and let A; = A({i}’) €M,—1, the principal submatrix of A 
resulting from deleting row and column į, i=1,...,. Show that 


1.2.13 
Ht ( ) 
5, Recall Problem 6 of the preceding section. Show that the trace of a 
nilpotent matrix is 0. What is the characteristic polynomial of a nilpotent 
matrix? 


d A 
>, palt) = D pale) 


6. If \eo(A) has multiplicity 1 as a root of p4(t)=0, Ae M,, show 
that rank (A—\J)=n—1, but not necessarily conversely. [Recall 
example (1.2.7b).] Hint: Use (1.2.13) and the fact that (d/dt) p(t) 490 
at t=) to conclude that some principal submatrix of A—)J of size n—1 
is nonsingular. 


7. Use (1.2.12) to determine the characteristic polynomial of the matrix 


1 1 0 0 0 
1 1 1 0 0 
0 1 1 1 0 
0 0 1 1 I 
0 0 0 1 #1 


al 


Consider how this procedure could be used to compute the characteristic 
polynomial of a general n-by-n tridiagonal matrix (0.9.10). 


8. If AEM, and o(A)=f), ..., An assume that o(A*)=[Xi, -.-, Ny} Show 
n 
rA = ¥ yf 
i=l 
for all positive integers k. The right-hand sum is called the kth moment 


of the eigenvalues of A. The stated assumptions follow from (2.3.1). 


9. Explicitly compute $2(Aj,..-, 6), S31; -3 A6); Sais -s A6), and 
Ss (Àr, vey A6). 
10. Let V be a vector space over a field F. An eigenvalue of a linear 


transformation T: V =V is a scalar >e F such that there is a nonzero 
vector v e V with Tv = Xv. Show that if F is the field of complex numbers 


44 Eigenvalues, eigenvectors, and similarity 


and if V is finite-dimensional, then every linear transformation 7 has an 
eigenvalue. Give examples to show that if either hypothesis is weakened 
(finite dimensionality of Vor F = C), then T may not have an eigenvalue. 
Hint: Let G be a basis for V and consider [T]q. 


11. Let p(t)=a,t"+a,_\t"” |+---+at+ao, a,=1, be a given monic 
polynomial with zeroes ^i, \2,..-, Ap, including multiplicities. Denote the 
kth moments of the zeroes by py = MHASH ---+é, k =1,2,.... Demon- 
strate Newton’s identities 


k=1,2,...,” (1.2.14) 


Explain why the first n moments of the zeroes uniquely determine the 
coefficients of the polynomial p(t) (and hence the zeroes) and con- 
versely. Hint: Show that for some R>0, if |z| >R, then (t=) l= 
IANT? +N T? + e and hence 


Kän-k t Mi An—k4it H2an-k42 t't Hoan =O, 


f(O)= X 0-A) =n Ht at O e for >R 
i=j 


Show that p’(t) = p(t) f(t), from which the Newton identities and the 
additional identities 
k=1,2,... 


for the higher-order moments follow from a comparison of coefficients. 


MK Ag+ bear Qt oot + ong k—14n—-1 t+ beng kan = 9, 


12. Let A, Be M, be given. Show that A and B have the same eigen- 
values if and only if tr A“ =tr B* for k=1,2,...,n. Hint: Use Problem 8 
and the Newton identities (1.2.14) to show that the characteristic poly- 
nomials of A and B are the same. 


1.3 Similarity 


As indicated in Section (1.0), a similarity transformation of a matrix in 
M, corresponds to representation of a linear transformation on C” in 
another basis. Thus, studying similarity can be thought of as studying 
properties which are intrinsic to a linear transformation, or the proper- 
ties which are common to all its various basis representations. 


1.3.1 Definition. A matrix Be M,, is said to be similar to a matrix 
AeéM,, if there exists a nonsingular matrix Se M, such that 
B=S~'AS 


The transformation A > S~'AS is called a similarity transformation by 
the similarity matrix S. The relation “B is similar to A” is sometimes 
abbreviated B~ A. 


1.3 Similarity 45 


1.3.2 Observation. Similarity is an equivalence relation on M,, that is, 
similarity is 

(a) reflexive: A ~ A; 

(b) symmetric: B ~ A implies A ~ B; and 

(c) transitive: C ~ B and B ~ A imply C~ A. 


Exercise. Verify (1.3.2). 


Like any equivalence relation, the similarity relation partitions the set 
M,, into disjoint equivalence classes. Each equivalence class is the set of 
all matrices in M, similar to a given matrix, a representative of the class. 
All matrices in an equivalence class are similar, and matrices in two dif- 
ferent classes are not. Because of transitivity, the first and last matrices in 
an arbitrary finite sequence of similar matrices are in the same similarity 
equivalence class. The crucial observation is that matrices in an equiva- 
lence class share many important properties. Some of these will be men- 
tioned here, but a more complete description of the similarity invariants 
(e.g., Jordan canonical form) will come later in Chapter 3. 


1.3.3 Theorem. Let A, Be M,. If B is similar to A, then the charac- 
teristic polynomial of B is the same as that of A. 


Proof: For any t we have 
pa(t) = det(tI—B) 
= det(tS~'S—S~!AS) = det S~'(t]—A)S 
= det S~! det (17 — A) det S = (det S)~' (det S) det(¢7— A) 
= det(t/—A) = pa(t) E 


1.3.4 Corollary. If A, Be M, and if A and B are similar, then they 
have the same eigenvalues, counting multiplicity. 


1.3.5 Example. Having the same eigenvalues is a necessary but not 
sufficient condition for similarity. Consider the matrices 


0 1 d 0 0 
o of “ Joo 
Each has the eigenvalue 0 with multiplicity 2, but they are not similar. 


Exercise. Show that the only matrix similar to the zero matrix is itself 
and use this fact to verify the statements in Example (1.3.5). 


46 Eigenvalues, eigenvectors, and similarity 


Exercise. If the matrices A,BeM, are similar and if q(+) is a poly- 
nomial, show that g(A) and q(B) are similar. In particular, show that 
A+al and B+a/J are similar if & is a scalar. 


Exercise. If A, B,C,DeM, and A~B via the similarity matrix S and 
C ~ D via the same similarity matrix S, show that A+C~ B+D. 


Exercise. If A,SeM,, and if S is nonsingular, show that E,(S~'AS)= 
E,(A) and, in particular, that det S~!4S=det A and trS~'AS =tr A, 
that is, the determinant, trace, and other sums of k-by-k principal 
minors are similarity invariants. 


Exercise. Show that rank is also a similarity invariant: If Be M,, is sim- 
ilar to Ae M,, then rank B=rank A. Hint: See (0.4.6). 


Since diagonal matrices are especially simple and have very nice prop- 
erties, it is of interest to know for which A e M, there is a diagonal matrix 
in the similarity equivalence class of A, that is, which matrices are similar 
to diagonal matrices. 


1.3.6 Definition. If the matrix A €e M, is similar to a diagonal matrix, 
then A is said to be diagonalizable. Sometimes the term diagonable is 
used. 


1.3.7 Theorem. Let Ae M,. Then A is diagonalizable if and only if 
there is a set of n linearly independent vectors, each of which is an eigen- 
vector of A. 


Proof: If A has n linearly independent eigenvectors x", ..., x0, form a 
nonsingular matrix S with them as columns and calculate 
SAS =S7'[Ax™ Ax... Ax] 


= So yx). px P = SH. TA 


=S7'SA=A 
where 
Ai 
A= 0 n, 0 
Xn 


and )\y,...,A, are eigenvalues of A. 
Conversely, suppose that there is a similarity matrix S such that 
S~'AS = A is diagonal. Then AS = SA. This means that A times the ith 


1.3 Similarity 47 


column of S (i.e., the ith column of AS) is the ith diagonal entry of 
A times the ith column of S (i.e., the ith column of SA), or that the 
ith column of S is an eigenvector of A associated with the ith diagonal 
entry of A. Since S is nonsingular, there are n linearly independent 
eigenvectors. ( 


Note that the proof of (1.3.7) is, in principle, an algorithm for diag- 
onalizing a diagonalizable matrix: find the eigenvalues of A; find asso- 
ciated eigenvectors, counting multiplicity, and array them in the matrix 
S. If the eigenvectors are linearly independent, then S is a diagonalizing 
similarity matrix. We stress, however, that, except for small analytical 
examples, this is not a practical computational procedure. 

Remark: lf A e M, is diagonalizable, the diagonal entries of any diag- 
onal matrix to which it is similar must be the eigenvalues of A, with 
proper multiplicities. Moreover, the linearly independent eigenvectors 
(which make up the similarity matrix) must correspond to the different 
eigenvalues with proper multiplicities; that is, if x‘, ...,x” are linearly 
independent eigenvectors and p4(t)=(t—)j)-+(f—d,), then Ax) = 
dx") for some permutation 7 of the indices. 


Exercise. Show that the matrix A= lo o] is not diagonalizable. 


Reason, on the one hand, that if it were diagonalizable, it would be sim- 
ilar to the 0 matrix - which it is not; and, on the other hand, calculate 
that, up to a factor of scale, there is only one eigenvector associated with 0. 


Exercise. If A is diagonalizable and if g(+) is a polynomial, show that 
q(A) is diagonalizable. Hint: q(SAS~') =Sq(A)S™'. 


Exercise. If A € M, and if `e o(A) has multiplicity mas an eigenvalue of 
A, show that A is not diagonalizable if rank(A—AJ) >n—m. 


A simple circumstance in which diagonalizability is assured is that in 
which the eigenvalues are distinct. An important precursor to this fact, 
which is otherwise useful, is the following lemma. 


1.3.8 Lemma. Suppose that \,,...,% are eigenvalues of Ae M,, no 
two of which are the same, and suppose that x"” is an eigenvector asso- 
ciated with \;, i=1,...,&k. Then {x",...,x} is a linearly independent 
set. 

Proof: The proof is essentially by contradiction. Suppose that x", ...,x ®© 
is actually a linearly dependent set. Then there is a nontrivial linear com- 
bination which produces the 0 vector, and in fact there is such a linear 


48 Eigenvalues, eigenvectors, and similarity 


combination with the fewest nonzero coefficients. Suppose that such a 
minimal linear dependence relation is 


ax!) tax +--+a,x"=0, rek 


We have r>1 because all x‘) #0. We may assume for convenience 
(renumber if necessary) that it involves the first r vectors. We also have 


Alax + + Fax) = ay Ax e +a,Ax” 
=x PH ++ +a,h,x' =0 


another dependence relation. Now multiply the first dependence relation 
by A, and subtract it from the second to produce 


aAA x + s. +a 1 Xp pp xP =0 


a third dependence relation, which has fewer nonzero coefficients than 
the first. This last relation is nontrivial since \;#\,, /=1,...,r—1. This 
contradicts the minimality assumption for the first dependence relation 
and completes the proof. ( 


1.3.9 Theorem. If Ae M, has n distinct eigenvalues, then A is diag- 
onalizable. 


Proof: If o(A)={M, ..., An}, let x? be an eigenvector associated with 
`p i=1,...,”. Since the eigenvalues are all different, fx) x) is a 
linearly independent set by (1.3.8), and therefore A is diagonalizable by 
(1.3.7). O 


Exercise. Give an example of a diagonalizable matrix A € M, that does 
not have distinct eigenvalues. 


Exercise. Recall from (0.9.5) that a permutation matrix P is a 0,1 matrix | 
with exactly one 1 in each row and column. Thus P= P~'. Show that 
permutation similarity of A € M, reorders the diagonal entries of A, and 
show that for any diagonal matrix there is a permutation similarity with | 
the diagonal entries occurring in any order, in particular, with any re 
peated diagonal entries occurring contiguously. 


In general, matrices A, B e M, do not commute under multiplication, | 
but if A and B are both diagonal, they always commute. The latter obser- | 
vation can be generalized somewhat; the following lemma will be helpful | 
in this regard. 


1.3 Similarity 49 


1.3.10 Lemma. Let Ae M, and Be M,, be given matrices and let 


A 0 
C= 
[o z] 
be the direct sum of A and B. Then C is diagonalizable if and only if both 
A and B are diagonalizable. 


Proof: If there is a nonsingular matrix S, € M, such that S;'AS, is diag- 
onal and a nonsingular matrix $,;¢M,, such that S3 'BS, is diagonal, 
then one checks easily that S~'CS is diagonal if S is the direct sum 


S; 0 
S=| 
0 SS 
Conversely, if C is diagonalizable, there is a nonsingular matrix Se 


My+m such that S7'CS=A=diag(\y, \z,..-,y4m) is diagonal. If we 
write S= [S1 52... Snym] with 


$; , 

=| tec", &eC”, EC” for i=1,2,...,n4+m 
Ni 

then Cs; = \;s; implies that A£; = X; é; and By; = d;n; for i=1,2,... n+ m. 

If there were fewer than n independent vectors in the set {&,.. 

then the column rank (and hence the row rank) of the matrix 


[é z see enim] EM nim 


would be less than z. By the same reasoning, if there were fewer than m 
independent vectors in the set {m,..., %74m}, then the column rank (and 
hence the row rank) of the matrix 


-3 Entm}s 


[m1 N2- Nnm] EMm nim 
would be less than m. In either event (or both), the matrix 


& wes Entom 


Mie Nam 


S=[s1.~ smal =| eMn 

would have row rank (and hence rank) less than n +m, which is impos- 
sible since S is invertible. Thus, there are exactly n independent vectors in 
the set {&), &,..., Enq mi, and since each is an eigenvector of A, the matrix 
A must be diagonalizable. The same argument shows that the matrix B is 
diagonalizable. (J 


1.3.11 Definition. Two diagonalizable matrices A, B e M,, are said to be 
simultaneously diagonalizable if there is a single similarity matrix Se M, 


50 Eigenvalues, eigenvectors, and similarity 


such that S~!AS and S~'BS-are both diagonal, that is, if there is a single 
basis in which the representations of both linear transformations are 


diagonal. 


Exercise. Show that if A, B e M, are simultaneously diagonalizable, then 
they commute. Hint: Write A= SDS“! and B= SES~', D and E diag- 
onal. Then calculate AB and BA, using the fact that diagonal matrices 
commute. This type of manipulation is used frequently. 


Exercise. Show that if Ae M, is diagonalizable and if M is a scalar 
matrix in M,,, then A and M are simultaneously diagonalizable. 


1.3.12 Theorem. Let A, Be M, be diagonalizable. Then A and B com- | 


mute if and only if they are simultaneously diagonalizable. 


Proof: Assume that A and B commute, perform a similarity transforma- 
tion on both A and B that diagonalizes A, and then assume without loss 
of generality that A is diagonal. Assume further, without loss of gen- 
erality, that any multiple eigenvalues of A occur contiguously on the 


main diagonal. Since AB = BA (the common similarity has not changed 


this), we have 
Nibi = Bij; 
where B = [b;;] and Aj, -. 


the \; terms, B is a block diagonal matrix: 


B, 
B= n, 0 (1.3.13) 


where these is one block B; for each different eigenvalue of A. Each B; 
is square and has size equal to the multiplicity of the eigenvalue of Ato 
which it corresponds. Since B is diagonalizable, each B; is diagonalizable 
by (1.3.10). Let T; be a nonsingular matrix such that T; !B;T; is diagonal. 
Since A has the partitioned form 


Al 0 
A= dot . (1.3.14) 


., \, are the eigenvalues of A. Since (N; — dj) bij = | 
0, we conclude that b;; = 0 whenever A; # \j. Thus, given the ordering of | 


1.3 Similarity 5] 


where each scalar matrix \;/ is the same size as B;, we see that T'AT 
and T~'BT are both diagonal, where T is the direct sum 


T, 0 
T. 
T= i (1.3.15) 


Tk | 


Note that T7 1N; IT; = \;l. 
The converse is included in an earlier exercise. 0O 


We conclude this section by extending (1.3.12) to larger sets of 
matrices and by making a weaker statement in the case of nondiagonal- 
izable matrices. 


1.3.16 Definitions. A family F S M, of matrices is an arbitrary (finite 
or infinite) set of matrices, and a commuting family is one in which each 
pair in the set commutes under multiplication. A subspace W & C” is said 
to be A— invariant, for Ae M,, if Awe W for every we W, and W is 
called F-invariant, for a family F S M,p, if W is A-invariant for each 
Aes, 

Notice that if Ae M,, each nonzero element of a one-dimensional 
A-invariant subspace of C” is an eigenvector of A. 


Exercise. Let Ae M,. If W is an A-invariant subspace of C” of dimen- 
sion at least 1, show that there is an eigenvector of Ain W. Hint: Choose 
a basis for W and consider the matrix that is the basis representation of 
the linear transformation T: w — Aw on W. Argue that this matrix has an 
eigenvalue. Crucial point: Why is 7 a linear transformation on W? 


A key observation is the following lemma. 


1.3.17 Lemma. If F © M, is a commuting family, then there is a vector 
xeC” that is an eigenvector of every AE S. 


Proof: Let WC" be an S-invariant subspace of minimum positive 
dimension; such a W exists, but it need not be unique. Since C” is itself 
$-invariant, we know there is an S-invariant subspace of dimension n. If 
there is one of dimension n—1, then ask if there is one of dimension 
n—2, and so forth. We actually show that every nonzero vector in W is 
an eigenvector of every Ae F, which is sufficient to complete the proof. 


52 Eigenvalues, eigenvectors, and similarity 


If this is not the case, then for some matrix Ae F, not every nonzero 
vector in W is an eigenvector of A. But, since W is S-invariant, it is 
A-invariant, and there is an x #0 in W such that Ax = àx for some 
eigenvalue \. Define Wo = {y e W: Ay = Ny}, so that xe Wy and Wo SW 
is a subspace. Because of the assumption about A, Wo # W, so that the 
(positive) dimension of Wo is strictly smaller than that of W. If Be F, we 
have Bxe W if xe Wo, since WoC W and W is S-invariant. But then, 
since F is a commuting family, A(Bx) =(AB)x=(BA)x = B(Ax) 
Bx =)\(Bx) and we conclude that Bxe Wo. It follows that Wo is F- 
invariant. But since Wo has strictly lower positive dimension than W, we 
reach a contradiction, which completes the proof. O 


Lemma (1.3.17) concerns commuting families of arbitrary cardinality. 
In particular, if F ={A, B} is a family of only two matrices, it says that 
any pair of commuting matrices has a common eigenvector. Theorem 
(1.3.12) says that if A and B not only commute but are each diagonal- 
izable as well, then they are simultaneously diagonalizable. Our next 
result shows that there is nothing special about commuting families of 
two diagonalizable matrices; this result generalizes to families of arbi- 
trary cardinality. 


1.3.18 Definition. A simultaneously diagonalizable family F C M, is a 
family for which there is a single nonsingular matrix Se M, such that | 


S~'AS is diagonal for every AES. 


1.3.19 Theorem. Let F CM, be a family of diagonalizable matrices. | 
Then Ş is a commuting family if and only if it is a simultaneously diag- | 


onalizable family. 


Proof: If F is simultaneously diagonalizable, then it is a commuting | 
family by a previous exercise. We prove the converse by induction on n. | 


If n=1, there is nothing to prove since every family is both commutin 


and diagonal. Let us suppose that n 22 and that, for k=1,2,...,n—-1, 
the assertion has been proved for all families of k-by-k matrices that | 
satisfy the hypotheses. If every matrix in & is a scalar matrix, there is | 


nothing to prove, so we may assume that A e § is a given n-by-n diag 
onalizable matrix with at least two distinct eigenvalues ,,A2,..., Ak 
2<k<n, that AB = BA for every matrix BeF, and that each Bef i 


diagonalizable. Using the same argument as in (1.3.12), we can reduce to} 


the case in which A is actually a diagonal matrix, any multiple eigen 
values of A occur contiguously, and the ordering of the eigenvalues i 
fixed, that is, A has the form (1.3.14). Since every B e F commutes wit 


1.3 Similarity 53 


A, the argument in (1.3.12) shows that every Be ¥ has the form of a 
direct sum (1.3.13) of matrices, each of size n—1 or less. The sizes and 
locations of the blocks in (1.3.13) are determined solely by the multi- 
plicities and ordering of the eigenvalues of A and are therefore the same 
for all Be F. Since all the matrices B e F commute (not just with A), and 
since every BeF has the form of a direct sum (1.3.13), each of the k 
direct summand blocks of any matrix in ¥ must commute with the corre- 
sponding block of every other matrix in F, and each of these blocks is 
diagonalizable by (1.3.10). By the induction hypothesis, there are k simi- 
larity matrices T1, Tz, ..., Tk of appropriate size, each of which diag- 
onalizes the corresponding block of every matrix in F. As in (1.3.15), the 
direct sum 7,;@---@7;, diagonalizes every matrix in $. O 


Remarks: Two important issues related to this section will be deferred 
until Chapter 3: (1) Given A, Be M,, how can we determine if A is 
similar to B? This is a motivation for canonical forms under similarity. 
(2) How can we tell if a given matrix A e M, is diagonalizable without 
computing its eigenvectors? 

As a final remark on commutativity, we observe that although AB and 
BA are not necessarily the same matrix (and not even necessarily the 
same size even when both are defined), they are as much the same as 
possible from the point of view of their eigenvalues. If A and B are both 
square, AB and BA have exactly the same eigenvalues. 


1.3.20 Theorem. Suppose that AeM,,, and BEM,» with man. 
Then BA has the same eigenvalues as AB, counting multiplicity, together 
with an additional n—m eigenvalues equal to 0; that is, pg4(t)= 
t"~""p 49(t). If m=n and at least one of A or B is nonsingular, then AB 
and BA are similar. 


Proof: Consider the following two identities involving block matrices in 
Mm+n: 


AB O][7 A]_ [AB ABA 

B ojlo Z7] |B BA | 

I Ajf[0 01] [AB ABA 

0O J\|I|B BA| |B BA 
Since the block matrix 


IA 
f 7 |€ Mns 


is nonsingular (all its eigenvalues are +1), we conclude that 


54 Eigenvalues, eigenvectors, and similarity 


I A] 'fAB O][{1 A] fO 0 

lo 7| e ollo 1|5ls s4 
that is, the two (m+n)-by-(m+n) matrices 
AB 0 0 0 

c= o and Go|, BA 


are similar. The eigenvalues of C; are the eigenvalues of AB together with 
n zeroes. The eigenvalues of C2 are the eigenvalues of BA together with m 
zeroes. Since the eigenvalues of C, and C3 are the same by (1.3.4), includ 
ing multiplicities, the main assertion of the theorem follows. The fina 
assertion follows from the observation that AB= A(BA)A™' if A is non 
singular and m=n. O 


Problems 


1. If A,BeM, and if A and B commute, show that A and any poly 
nomial in B commute. 


2. Let A,BeM, with o(A)={\j,...,A,} and o(B)={p1,---sBn}. If A 
and B are diagonalizable and commute, show that the eigenvalues o 
A+B are 


Ai + Hip h2+ Hiz» rey Ant Hi, 
for some permutation ij,...,i, of 1,...,7. 


3. If Ae M, and A=S~'DS, D=diag(d),...,d,), and p(+) is a polyno- 


mial, show that p(A) = S~'p(D)S and that p(D) = diag(p(d\), ---» P(dn)). | 
This provides a simple way of evaluating p(A) if one can diagonalize A. | 


4. Give an example of two commuting matrices that are not simulta- | 


neously diagonalizable. Does this contradict Theorem (1.3.12)? 


5, If Ae M, has distinct eigenvalues and if A commutes with a given 
matrix B € M„, show that B is a polynomial in A of degree at most n—1. 
Hint: Use the method employed in the proof of Theorem (1.3.12) to 
show that B and A must be simultaneously diagonalizable. Then recall 
that, given distinct numbers ai, -s Qn and numbers fj, .--, Bn, there is a 
(Lagrange interpolating) polynomial p(+) of degree at most n— 1 such 
that p(a;) = ß;. See (0.9.11). 


6. If Ae M, is diagonalizable, consider the characteristic polynomial 
pa(t) and show that p,(A) is the zero matrix. 


7. A matrix A € M, is a square root of Be M, if A’ = B. Show that every 
diagonalizable matrix in M, has a square root. 


1.3 Similarity 55 


8. If A,BeM,, and if at least one has distinct eigenvalues (no assump- 
tion, even of diagonalizability, about the other), show that A and B com- 
mute if and only if they are simultaneously diagonalizable. Suggestion: 
One direction is easy; for the other, try the following type of argument as 
an alternate to that used for (1.3.12). Suppose B has distinct eigenvalues, 
heo(B), and Bx=)x with x #0. Then B(Ax) = A(Bx) = Ax = Ax, 
which implies that Ax is also an eigenvector of B associated with X. Since 
there cannot be two linearly independent such vectors (because ) has 
multiplicity 1), Ax must be a multiple p of x; that is, Ax=px. Thus, 
every eigenvector of B is also an eigenvector of A, and A is diagonal- 
izable by the same matrix of eigenvectors that diagonalizes B. See Prob- 
lems 12 and 13 for another approach to the same fact. 


9. Provide the details for the following alternate proof of Theorem 
(1.3.20). (a) First, suppose that A, Be M, and that at least one of them is 
nonsingular. Show that AB is similar to BA, and hence the characteristic 
polynomials of AB and BA are the same. Hint: If A is nonsingular, 
BA=A~'(AB)A. Hence o(AB) =0(BA) in this case. (b) Consider the 
singular matrices A = lo o] and B = E S]. Show that AB and BA are not 


similar, but that they do have the same eigenvalues. (c) Show that if 
A,BeM,„, AB and BA have the same eigenvalues, counting multiplic- 
ities. Hint: Consider the following analytic argument. For all sufficiently 
small e> 0, A= A +el is nonsingular; thus, A,B and BA, are similar and 
hence the characteristic polynomials of A,B and BA, are the same. If we 
now let e — 0, similarity may fail in the limit, but equality of the charac- 
teristic polynomials continues to hold since pa,a(t)=det(t/—A,B) 
depends continuously on e. Thus, AB and BA have the same character- 
istic polynomials and therefore the same eigenvalues, counting multi- 
plicities. (d) Finally, if A € Mm,n and B eM, m, show that AB and BA 
have the same eigenvalues, counting multiplicities, except that BA has an 
additional n— m eigenvalues equal to 0 (assuming n >m); equivalently, 
pealt) =t" "pag(t). Hint: Make n-by-n matrices out of both A (by 
appending 0 rows) and B (by appending 0 columns), apply the last result, 
and compare the two new products (appropriately partitioned) to the two 
old ones. 


10. Use (1.3.8) to prove the following generalization: Let Ae M, be 
given, and let \;,..., Ax be distinct eigenvalues of A. For each /=1, 2,...,k 
suppose {x{), x, ..., x4} is an independent set of n; = 1 eigenvectors of 
A corresponding to the eigenvalue \;. Show that the union of the sets 
(xP PU Uf, ol, ..., xf) is an independent set. Hint: 
If some linear combination is zero, say 


56 Eigenvalues, eigenvectors, and similarity 


Sa, o Šua 
0=5 E cyx = yy? 
121 j=l i=! 


use (1.3.8) to show that each yp“ =0. 


11. Provide the details for the following alternate, and more construc- 
tive, proof of Lemma (1.3.17): (a) Show that if A, Be M, commute, then 
they have a common eigenvector. Hint: Let x be an eigenvector of 
A, Ax =)x, x ¥0, and consider the sequence x, Bx, B*x, B*x,.... There 
must be a first element of this sequence that is dependent upon its prede- 
cessors, say B*x, so S=Span{x, Bx, B*x,...,B*~'x} is a subspace in- 
variant under B and hence there is some nonzero y e S with By = py. But 
AB!x = B/ Ax = B/)\x = \B/x, so every vector in Sis an eigenvector for A 
as well. (b) If 5 ={A;, Ao,...,A,} is a finite commuting family, use 
induction to show that there is a common eigenvector for all A;. Hint: If 
y #0is a common eigenvector for A,, A>, ..., A,,-1, consider the sequence 
Y, Amy, AY, Ap) Y, «+. as in (a). (c) If $ C M, is a commuting family that 
does not have finite cardinality, observe that there cannot be more than 
n? linearly independent matrices in F. Select a maximal independent set 
and use (b); argue that a common eigenvector for this finite set is a 
common eigenvector for all of F. 


12. If A=diag(\};, A2,..-,4,)€M, has n distinct diagonal entries, use 
the ideas from the proof of Theorem (1.3.12) to show that AB= BA for 
some Be M, if and only if B is itself diagonal (but not necessarily with 
distinct diagonal entries). 


13. Suppose Ae M,, has n distinct eigenvalues. If AB = BA for some 
BeM,, show that B is diagonalizable and that A and B are simulta- 
neously diagonalizable. Hint: If A= SAS! with A diagonal, show that A 
commutes with S~'BS and use Problem 12. 


14. Extend the result of Problem 13 to a commuting family F C M, that 
contains at least one matrix with n distinct eigenvalues. Compare this 
result to Theorem (1.3.19), which assumes that all the members of the 
family are diagonalizable. Is this a stronger result? 


15. Consider the block diagonal matrix A = diag(\; J), Mah, 6, Agip) € 
M, with T;eM,,, 4,4; if i#j, and ntt +ng=n. Show that 
AB = BA for some Be M, if and only if the matrix B has the block diag- 
onal form B=diag(B,, Bo,..., By) with ByeM,,, J =1,2,...,k. How is 
this result related to Problem 12? 


16. Let A,Be™M,,, and suppose that either A or B is nonsingular. If 
AB is diagonalizable, show that BA is also diagonalizable. Consider 


1.4 Eigenvectors 57 


A= l | and B= lo | to show that this need not be true if both A and 


B are singular. 


1.4 Eigenvectors 


Thus far, the eigenvalues of A e M,, have been emphasized relative to the 
eigenvectors. The eigenvectors are also important not only for their role 
in diagonalizability, but also for their utility in a variety of applications. 
We discuss them somewhat further here, but we begin with an additional 
observation about eigenvalues. 


1.4.1. Observation. Let A € M,,.. (a) The eigenvalues of A’ are the same 
as those of A, counting multiplicities. (b) The eigenvalues of A* are the 
complex conjugates of the eigenvalues of A, counting multiplicities. 


Proof: Since det(t!—A"’)=det(tl—A)’ =det(tI—A), par(t) = pa(t), 
and (a) follows. Similarly, det(77—A*) =det[(t/—A)*] =det(t/—A), 
which implies p4-(7) = p4(t), and (b) follows. © 


Exercise. If x, y e C” are both eigenvectors of Ae M, corresponding to 
the eigenvalue A, show that any nonzero linear combination of x and y is 
also an eigenvector corresponding to >. Conclude that the set of all 
eigenvectors associated with a particular \€ o0(A), together with the 0 
vector, is a subspace of C”. 


Exercise. Observe that the subspace described in the preceding exercise is 
precisely the null space of A—XI. 


1.4.2 Definition. Let Ae M,,. For a given \ € o(A), the set of all vec- 
tors xe C” satisfying Ax = Xx is called the eigenspace of A corresponding 
to the eigenvalue à. Note that every nonzero element of this eigenspace is 
an eigenvector of A corresponding to X. 


Exercise. Show that an eigenspace of A corresponding to an eigenvalue A 
is an A-invariant subspace, but not conversely. Show that a minimal 
A-invariant subspace (containing no strictly lower-dimensional, non- 
trivial A-invariant subspace) is the span of a single eigenvector of A. 
Hint: Use the exercise preceding (1.3.17). 


If one knows an eigenvalue of A e M,,, a conceptually simple, but not 
necessarily practical, approach to computing an associated eigenvector is 
to solve the linear system 


58 Eigenvalues, eigenvectors, and similarity 
(A—\I)x=0 | 


The set of all solutions constitutes the eigenspace. 


1.4.3 Definition. The dimension of the eigenspace of AéM,, corre- 
sponding to the eigenvalue A is called the geometric multiplicity of the 
eigenvalue \. The multiplicity of as a zero of the characteristic poly- 
nomial p4(°) (the notion of multiplicity to which we have referred thus 
far) is called the algebraic multiplicity of the eigenvalue \. In general, 
these two concepts are different. If the term multiplicity is used without 
qualification in reference to an eigenvalue, it usually means the algebraic 
multiplicity. We shall follow this convention. 

Notice that the geometric multiplicity is just the maximum number of 
linearly independent eigenvectors associated with an eigenvalue. 


Exercise. Show that the geometric multiplicity of an eigenvalue \ of 
AéM, is never more, and can be less, than its algebraic multiplicity. If 
the algebraic multiplicity is at least 1, then the geometric multiplicity is at 
least 1. Hint: Suppose the geometric multiplicity of Nis k, and let Se M, 
be nonsingular with its first k columns linearly independent eigenvectors 
of A corresponding to à. Use reasoning similar to that used in (1.3.7) to 


show that S~!AS has the form [x z |, Te M,, and conclude that the 
algebraic multiplicity of \ is at least K. 


1.4.4 Definitions. A matrix Ae M,, some eigenvalue of which has 
strictly smaller geometric than algebraic multiplicity, is said to be defec- 
tive. If the geometric multiplicity is the same as the algebraic multiplicity 
for each eigenvalue, A is said to be nondefective. If each eigenvalue of 
AéM, has geometric multiplicity exactly 1 (regardless of the algebraic 
multiplicity), A is called nonderogatory. All of these definitions are 
classical and enjoy somewhat limited current usage. 

Notice that a nonderogatory, nondefective matrix is simply a matrix 
with distinct eigenvalues. Also, a matrix Ae M, is diagonalizable if and 
only if A is nondefective. This is just a restatement of (1.3.7) that empha- 
sizes the necessity of the existence of enough linearly independent eigen- 
vectors associated with each eigenvalue. 


1.4.5 Example. Even though A and A’ have the same eigenvalues, 
their eigenvectors corresponding to a given eigenvalue may be very dif- 
ferent. For example, let 


1.4 Eigenvectors 59 


A= 2 3 
0 4 
Then the (one-dimensional) eigenspace of A corresponding to the eigen- 


value 2 is spanned by | | , while the corresponding eigenspace of A’ is 
spanned by E nl 


Exercise. Verify the details of (1.4.5). 


It should be clear that the theory of eigenvalues and eigenvectors we 
have developed thus far could have been developed in parallel for left 
multiplication by row vectors. The eigenvalues would be the same, but 
the eigenvectors would, in general, be different (even allowing for rows 
vs. columns). 


1.4.6 Definition. A nonzero vector y e C” is called a left eigenvector of 
Ae M, corresponding to \€ o(A) if 


yA = ry” 
If necessary for clarity, we refer to the vector x of (1.1.3) as a right 


eigenvector. When the context does not require distinction, we just say 
eigenvector, 


Exercise. Show that a left eigenvector y corresponding to the eigenvalue 
dof Ae M, is a right eigenvector of A* corresponding to A, and also that 
jis a right eigenvector of AT corresponding to \. Show by example that, 
even for Ae M,(R), right and left eigenvectors need not be the same. 


Recall from (0.6.2) that two vectors x, y e C” are called orthogonal if 
y*x =0. The following result is known as the principle of biorthogonality. 


1.4.7 Theorem. If Ae M, and if \, pe o(A), with X # pu, then any left 
eigenvector of A corresponding to p is orthogonal to any right eigen- 
vector of A corresponding to xX. 


Proof: Let ye C” be a left eigenvector of A corresponding to p and let 
xe C” be a right eigenvector of A corresponding to \. Manipulate y*Ax 
in two ways: 


y*Ax = y*(Ax) = M *x) 
= (wy*)x = p(y*x) 


60 Eigenvalues, eigenvectors, and similarity 


Since \ # p, the only possible way to have Ay*x = py*x is to have y*x = 0; 
that is, x and y are orthogonal. O 


Exercise. If A*=A€EM,, that is, A is Hermitian, and if A has distinct 
eigenvalues, show that there are n pair-wise orthogonal (right) eigen- 
vectors of A. Recall from Problem 8 of Section (1.1) that the eigenvalues 
of A are all real. Hint: Since A* = A, left eigenvectors coincide with right 
eigenvectors. Apply (1.4.7). 


We shall see in the next chapter that the assumption of distinct eigen- 
values is unnecessary in the statement of the above exercise. 

We next note that eigenvectors transform under similarity in a simple 
way. The eigenvalues are, of course, unchanged by similarity. 


1.4.8 Theorem. Let A, Be M,. If xe C” is an eigenvector correspond- 
ing to \€o(B) and if B is similar to A via S, then Sx is an eigenvector of 
A corresponding to the eigenvalue X. 


Proof: If B=S~'AS and if Bx =)x, then S~'ASx =x, or ASx = \Sx. 
Since S is nonsingular and x #0, Sx #0, and hence Sx is an eigenvector 
of A. O 


Exercise. Verify that e=[1,1,1]’ is an eigenvector of 


1 2 3 
A=|3 2 1 
2 3 1 


If D=diag(1,2,3), determine an entry-wise positive eigenvector of 
D~'AD. 


As a final observation in this section, we note that eigenvectors can be 
used to gain information about eigenvalues of principal submatrices. 
This information yields another proof of the inequality between the geo- 
metric and algebraic multiplicities of an eigenvalue. 


1.4.9 Theorem. Let A e M, and \ e C be given, and let k => 1 bea given 
positive integer. Consider the following three statements: 


(a) Ais an eigenvalue of A of geometric multiplicity at least k. 

(b) If AeM,, is a principal submatrix of A and if m>n—k, then X 
is an eigenvalue of A. 

(c) Ais an eigenvalue of A of algebraic multiplicity at least K. 


1.4 Eigenvectors 61 


Then (a) implies (b), and (b) implies (c). In particular, the algebraic mul- 
tiplicity of an eigenvalue is at least as great as its geometric multiplicity. 


Proof: Assume (a) and let AeéM,, be a principal submatrix of A with 
m>n—k. Using a permutation similarity and (1.4.8), there is no loss of 
generality to assume that A appears in the upper left corner of A. Let 
Vis... Ug be linearly independent eigenvectors of A corresponding to the 
eigenvalue \. Partition A and each vector v, as 


A * a 
a=| |. AéeM,,; 
xk è 


ui _ 

v=], u,eC”, weeC"™", i=1,2,...,k 
Wi 

The vectors w, ..., wọ are dependent because they are k vectors in a space 

of dimension n—m<n—(n—k)=k, so there are scalars a;,...,a,€C, 

not all zero, such that qiwi + --: +a,w, =0. Then v= av, + ++: +AU = 


le] #0, where u=a,u,+--- +ayu, #0 and Av =v. Writing this equa- 
tion in partitioned form gives 


Â * fu Au hu 
w[i eje e] 

This shows that \ is an eigenvalue of Â, which is the assertion in (b). 

Now assume (b) and recall the identity (1.2.13), which relates the 
derivative of the characteristic polynomial p,4(t) to the characteristic 
polynomials p4,(t) of the n principal submatrices A),...,A, of A. If 
k =1, there is nothing to prove. If k>1, then (b) says that A is an eigen- 
value of each A; and hence pa, (à) =0 and p4(\) =0. If k > 2, differen- 
tiate the identity (1.2.13) to get 


PAI) = È PA (0) (1.4.10) 


and use (1.2.13) to replace each derivative on the right-hand side with a 
sum of characteristic polynomials of principal submatrices of each A,. 
Since a principal submatrix of A;, with one row and column deleted, is a 
principal submatrix of A of size n —2, the assumption in (b) and the iden- 
tity (1.2.13) applied to each A, permit us to conclude that p4(\) = 0. Repe- 
tition of this argument shows that the successive derivatives Pr) vanish 
fori=0,1,...,k—1, and hence à has algebraic multiplicity at least kK. 0 


Problems 


1. Show that Ae M, has rank 1 if and only if there exist two nonzero 
vectors x, ye C” such that 


62 Eigenvalues, eigenvectors, and similarity 
A=xy* 


Show that (a) such an A has at most one nonzero eigenvalue (of algebraic 
multiplicity 1); (b) this eigenvalue is y*x; and (c) xis aright and yis a left 
eigenvector corresponding to this eigenvalue. What is the geometric 
multiplicity of the eigenvalue 0? 


2. Show that a matrix Ae M, of rank k may be written as 
D, k) (k 


for x®, y% EC", i=1,..., k, that is, as a sum of k rank 1 matrices. Hint: 
Find k linearly independent rows and columns and use the fact that the 
others can be written in terms of these. 


3. Suppose that Te M, is upper triangular with distinct eigenvalues 
tit, -+e nn occurring from upper left to lower right, down the diagonal. 
Show that there is a right eigenvector of T corresponding to t; whose last 
n—i components are all 0, and a left eigenvector of T corresponding to fj; 
whose first i—1 components are all 0. What if the ¢;; are not distinct? 


4. Show that the (only) eigenvalue 1 of the matrix displayed in (1.2.7b) 
has geometric multiplicity 1. Describe the associated eigenspace. 


5. Consider the block triangular matrix 


Ay Ar 
A= , AjyeM,, i=1,2 
Show that the eigenvalues of A are those of Aj, together with those of 
Az, counting multiplicities. If xe C” is a right eigenvector, of Aj; corre- 
sponding to \€ o(A);), and if y e C”? is a left eigenvector of A corre- 


sponding to u € o(A22), show that [>] e C"! +”2 is a right eigenvector and 
[5] is a left eigenvector of A corresponding to \ and p, respectively. 
What can you say about left and right eigenvectors of A corresponding to 


à and p, respectively? Can you generalize these observations to block 
triangular matrices with arbitrarily many diagonal blocks? 


6. If Ae M, has component-wise positive left and right eigenvectors 
corresponding to an eigenvalue of geometric multiplicity 1, show that A 
has no other component-wise nonnegative eigenvectors, except for multi- 
ples of these. 


7. In this problem we outline the power method for finding the largest 
eigenvalue and an associated eigenvector of A e M,. We make some sim- 
plifying assumptions and allude to analytical details that can be made 
precise. Suppose that A e M, has distinct eigenvalues \,,...,,, and that 


1.4 Eigenvectors 63 


there is exactly one eigenvalue \,, of maximum modulus p(A). If x e C” 
is not orthogonal to a left eigenvector associated with \,,, show that the 
sequence 
(k+1) 1 kW pa 

= OZA, k= 0,12) 
approaches an eigenvector of A and the ratios of a given nonzero compo- 
nent in the vectors Ax and x“) approach \,,. Hint: Assume without 
loss of generality that \,, = 1 and let y",..., y be linearly independent 
eigenvectors corresponding to )j,...,,- The vector x may be written 
uniquely as 

x mony Open tory y™ 
with a, # 0. Notice that x = aky ® +- +a, y ™, except for a 
factor of scale. Since |\;| <1, |A;|* > 0, i=1,...,n—1 and this sum ap- 
proaches a multiple of y ™. 


8. Further eigenvalues (and eigenvectors) can be calculated using the 
power method via a bridge, called deflation, which delivers a square 
matrix, of size 1 smaller, whose eigenvalues are the remaining eigenvalues 
of Ae M,. Let X, and y™ be an eigenvalue and eigenvector of A (cal- 
culated using the power method or otherwise), and let Se M, be non- 
singular with first column y. Show that 


and that the eigenvalues of A; e M,,4; are \,...,\,— in the notation of 
Problem 7. Another eigenvalue may be calculated from A, and the defla- 
tion repeated - and so on. 


9. Let AEM, have eigenvalues ),,...,A,—1,0, so that rank Asn—1, 
and suppose that the last row of A is a linear combination of the others. 
(a) If A is partitioned as 


lor a 2 
ay ax 
in which A}; € M,_,, show that there is a vector be C”~' such that 


ai,=b'A,, and ay=b7ay 


Interpret b in terms of a left eigenvector of A corresponding to 0. (b) Show 
also that A,;+a,,b7 eM, has eigenvalues M, ..., n-i- Hint: Consider 
similarity of A via 


lor a 


64 Eigenvalues, eigenvectors, and similarity 


Notice that this is another version of deflation since a matrix of smaller 
size with the remaining eigenvalues is produced. If one knows one eigen- 
value of A, then the process described in this problem can be applied to 
P(A—dI)P~', for a suitable permutation P. 


10. Let Te M, be a nonsingular matrix whose columns are left eigen- 
vectors of A e M,,. Show that the columns of (7*)~! are right eigenvectors 
of A. 


CHAPTER 2 


Unitary equivalence and normal matrices 


We next study a special type of similarity that is intimately involved with 
many aspects of the application of matrix analysis. 


2.0 Introduction 


For a general nonsingular matrix Se M,, we made an initial study of 
similarity via S in Chapter 1. For certain very special nonsingular ma- 
trices, called unitary matrices, the inverse of S has a simple form: S l 
S*. Similarity of A e M, via a unitary matrix, A + S*AS, is not only con- 
ceptually simpler (S* is much easier to evaluate than S -D than general 
similarity, but it has a number of attractive features that will become 
clearer through the development to follow. As a general rule, unitary 
similarities are preferable to general similarities, and it is therefore useful 
to know what can be achieved through unitary similarity. Equivalence 
classes under unitary similarity are, however, finer than under general 
similarity (two matrices can be similar but not unitarily similar), and cor- 
respondingly less can be achieved. For this reason, we shall return to 
study general similarity further in Chapter 3. 

The transformation A > S*AS, Ae M,, in which S is assumed to be 
nonsingular but not necessarily unitary, is called *congruence and will be 
studied in Chapter 4. This transformation, too, is an equivalence relation 
on M, with a number of attractive features (different from those of sim- 
ilarity). It is important to realize that similarity by a unitary matrix is 
both a similarity and a *congruence and this is the broadest class of 
transformations that shares the properties of both. 


65 


66 Unitary equivalence and normal matrices 
2.1 Unitary matrices 
2.1.1 Definition. Recall that the vectors x, ...,.x, € C” form an orthog- 


onal set if xx; =0 for all pairs lsi<j <k. If, in addition, the vectors 
are normalized, x*x;=1, i=1,...,k, then the set is called orthonormal. 


Exercise. If {y;,...,¥%} is an orthogonal set of nonzero vectors, show 
that the set {x;,..., Xy} defined by x;=(y¥y)7'y;, G=1,...,k, is an 
orthonormal set. 


2.1.2 Theorem. An orthonormal set of vectors is linearly independent. 


Proof: Suppose that {x),...,x,} is an orthonormal set, and suppose 0 = 
axit tazgx,. Then 0= o 0=5;j Gi XX) = Dti la| xx; because 
the vectors x; are orthogonal, and $4; Ja; xixi = Lh, la;|? = 0 because 
the vectors x; are normalized. Thus, all a; =0 and hence {x}, ..., X4} is a 
linearly independent set. C 


Exercise. Show that an orthogonal set of nonzero vectors is linearly 
independent. 


Exercise. Show that if x,,...,X,;@C" is an orthogonal set, then either 
kzn or at least k—n of the vectors x; are equal to zero. 


An independent set need not be orthonormal, of course, but one can 
apply the Gram-Schmidt orthonormalization procedure (0.6.4) to it and 
obtain an orthonormal set with the same span as the original set. 


Exercise. Show that any k-dimensional real or complex vector space has 
an orthonormal basis (a basis consisting of an orthonormal set). 


2.1.3 Definition. A matrix Ue M, is said to be unitary if U*U =I. If, 
in addition, Ue M,,(R), U is said to be real orthogonal. 

The unitary matrices in M, form a remarkable and important set. We 
list some of the basic equivalent conditions for U to be unitary in (2.1.4). 


Exercise. If A e M, is given and BA = I for some B E€ Mn, show that A is 
nonsingular, B is unique, and AB =Z. We write B= 47 | Hint: If Ax =0, 
then x = x= BAx. Nonsingularity implies that the equations Ax = y and 

xTA =y" each have a unique solution for any given y € C”. Argue (col- 
umn by column) that ABg =I and (row by row) that BLA =1 have unique 


2.1 Unitary matrices 67 


solutions Bz, Bge M,,. Now calculate B; ABp in two ways to show that 
B; = Bpr. 


2.1.4 Theorem. If U e€eM,, the following are equivalent: 
(a) U is unitary; 
(b) Uis nonsingular and U*=U7! 
(co) UU*= 
(d) U*is unitary; 
(e) The columns of U- form an orthonormal set; 
(£) The rows of U form an orthonormal set; and 
(g) For all x eC”, the Euclidean length of y = Ux is the same as that 
of x; that is, y*y =x*x. 


Proof: Statement (a) implies (b) since U~! (when it exists) is that unique 
matrix, left multiplication by which produces 7; the definition of unitary 
guarantees that U* is such a matrix. Since BA =I if and only if AB =I 
(for A, Be M,,), (b) implies (c). Since (U*)* = U, (c) implies that U* satis- 
fies the requirement necessary to be unitary; that is, (c) implies (d). Since 
the converse of each of these implications is similarly observed, (a)-(d) 
are equivalent. 

Considering the mechanics of matrix multiplication and letting u 
denote the ith column of U, i=1,...,, the statement U*U = I means that 


Oey) -f9 if jæi 

1 if j=i 
Thus, U*U =I is another way of saying that the columns of U are ortho- 
normal, and (a) is equivalent to (e). Similarly, (d) and (f) are equivalent. 
If (a) holds and y = Ux, then y*y = x*U*Ux = x*Ix =x*x, so that (a) 
implies (g). To verify the converse, on the other hand, requires somewhat 


more elaborate calculation. Tools occurring later in this book would 
make this fact more immediate, however. First consider the case n = 2. 


Assuming (g) and letting x = [a] , we find that 1 =x*x = y*y=x*U*Ux= 
the 1, 1 entry of U*U. Similarly, letting x = HE we conclude that the 2, 2 
entry of U*U is also 1, and U*U must have the form 


la i] 


where a is the inner product of column 1 and column 2 of U, and @ is the 
inner product of column 2 and column 1. Letting x = li ] in (g) and again 
calculating, we find 2 = x*x = y*y =x*U*Ux =2 + (a +ã). Letting x = [i] , 


68 Unitary equivalence and normal matrices 


we find 2=2+i(a—@). Thus, a+@=2Rea=0 and a—a@=2ilma=0, 
and hence a=0. This means that if x*U*Ux =x*x for all xe C’, then 
U*U =I; that is, U is unitary (if Ue M3). Now consider n> 2, and let 
A=U*U. Let xe C” be such that all components other than the ith and 
jth, isj, are 0. Then 


ax = naadi i] | 
J 


[see (0.7.1) for submatrix notation], and we have just shown that (g) 
implies that A({i, j}) =e Mb). Since i and j are arbitrary, we conclude 
that every 2-by-2 principal submatrix of A is the 2-by-2 identity. The only 
such A is A =e M,„, and, since the case n = 1 is obvious, we conclude 
that (g) implies (a), which completes the proof. O 


2.1.5 Definition. A linear transformation T: C” + C” is called a Euclid- 
ean isometry if x*x =(Tx)*(Tx) for all x e C”. Theorem (2.1.4) says that 
a square complex matrix Ue M,, is a Euclidean isometry (via U: x Ux) 
if and only if it is unitary. See Section 5.2 for other kinds of isometries. 


Exercise. Let 


T(6) = cos@ sin® 
~ | —sin@ cosé 
where 0 is a real parameter. (a) If Ue M2(R), show that U is real orthog- 


onal if and only if U =T(0) or 
1 0 
= T(0 
U lo > | (0) 
for some ĝe R. (b) If Ue M2(R), show that U is real orthogonal if and 
only if U = T(0) or 
0 1 
U= T(6 
f o (0) 


for some eR. These are two different presentations, in terms of a 
parameter 6, of the 2-by-2 real orthogonal matrices. Interpret them 
geometrically. 


2.1.6 Observation. If U, Ve M, are unitary (respectively real orthog- 
onal), then the product UV is also unitary (respectively real orthogonal). 


Exercise. Use (b) of (2.1.4) to prove (2.1.6). 


2.1 Unitary matrices 69 


Exercise. If {x ,,x2,...,X,;] GC" is an orthonormal set and if Ue M, is 
unitary, show that {Ux,,..., Ux,} is an orthonormal set. 


2.1.7 Observation. The set of unitary (respectively real orthogonal) 
matrices in M, forms a group. This group is generally referred to as the 
n-by-n unitary (respectively orthogonal) group, a subgroup of GL(n, C) 
[see Section (0.5)]. 


Exercise. Recall that a group is a set that is closed under a single asso- 
ciative binary operation (“multiplication”) and such that the identity for 
and inverses under the operation are contained in the set. Verify (2.1.7). 
Hint: Use (2.1.6) for closure; matrix multiplication is associative; Ie M, 
is unitary; and U* =U"! is again unitary. 


The set (group) of unitary matrices in M, has another very important 
property. Notions of “convergence” and “limit” of a sequence of matrices 
will be presented precisely in Chapter 5, but can be understood here in 
terms of “convergence” and “limit” of each 7, / entry. The defining iden- 
tity U*U =I means that every column of U has Euclidean length 1, and 
hence no entry u;; of U=[u;;] can have absolute value greater than 1. 
If we think of the set of unitary matrices as a subset of cr”, this says 
it is a bounded subset. If U; = [ut] is a sequence of unitary matrices, 
k=1,2,... such that lim, ~% uj)? = uj, exists for all i, j =1,2,...,”, then 
from the identity UU, =T for all k =1,2,... we see that lim, „o UU, = 
UjUp =I, where Up = [ui]. Thus, the limit matrix Up is also unitary. 
This says that the set of unitary matrices is a closed subset of c”, 

Since a closed and bounded subset of a finite dimensional Euclidean 
space is a compact set (see Appendix E), we conclude that the set (group) 
of unitary matrices in M, is compact. For our purposes, the most impor- 
tant consequence of this observation is the following selection principle 


for unitary matrices. 


2.1.8 Lemma. Let U,, U2,... be a given sequence of unitary matrices 
in M,. There exists a subsequence U;,, Ux,,... such that all of the entries 
of Ux; converge (as sequences of complex numbers) to the entries of a 
unitary matrix Up as i > œ. 


Proof: All that is required here is the fact that from any infinite sequence 
in a compact set one may always select a convergent subsegence. We have 
already observed that if a sequence of unitary matrices converges to some 
matrix, then the limit matrix must be unitary. O 


70 Unitary equivalence and normal matrices 


The unitary limit guaranteed by the lemma need not be unique; it can 
depend upon the subsequence chosen. 


Exercise. Consider the sequence of unitary matrices 


0 17% 
U, = k=1,2,... 
k f o| » My 


Show that there are two possible limits of subsequences. 


Exercise. The selection principle (2.1.8) applies as well to the orthogonal 
group; that is, a sequence of real orthogonal matrices has a subsequence 
that converges to a real orthogonal matrix. Verify this by tracing through 
the same logic in the real case. 


Compactness of the unitary group is invoked in Problem 3 of the next 
section. We shall have occasion to use it elsewhere in the book. 

A unitary matrix U has the property that UT! equals U*. One way to 
generalize the notion of a unitary matrix is to require that U ~! be similar 
to U*. The set of such matrices is easily characterized as the range of the 
mapping A —> A~!A* for all nonsingular A € M,,. 


2.1.9 Theorem. Let A € M, be a nonsingular matrix. Then A`! is sim- 
ilar to A* if and only if there is a nonsingular matrix Be M, such that 
A=B™'B*. 


Proof: If A= B~'B* for some nonsingular Be M,, then A~! =(B*)7'B 
and B*A~!(B*)~' = B(B*)~'!=(B~'B*)* = A*, so A™' is similar to A* via 
the similarity B*. Conversely, if A~! is similar to A*, then there is a non- 
singular matrix Se M, such that SA~'S~'=A*. Set Sps=e”S for 0ER 
and notice that SgA~'Sg'=eSA7'(e~°S~') = SA7'S~'= A*. But then 
So = A*S A and S7;=A*SjA. Adding these two identities gives Hy= 
A*H,A, where Họ= Sọ+53 is Hermitian. If Hy is singular, there is 
some nonzero xe C” such that 0 = Hax =Spx+Sjx, so —x=S_'Sjx= 
e~9@5-Is*y and S~'S*x = —e7¥x, Choose a value of 0=6)e€[0, 27) 
such that —e7!%0 is not an eigenvalue of S~'S*; the resulting Hermitian 
matrix H = Hg, is nonsingular and has the property that H = A*HA. 

Now choose any complex a such that |a| = 1 and ais not an eigenvalue 
of A*. Set B= B(al—A*)H, where the complex parameter 8 #0 is to be 
chosen, and observe that B is nonsingular. We want to have A = B'B*, 
or BA=B*. Compute B*=H(BaIl—BA), and BA=6(al—A*)HA= 
B(aHA— A*HA) = B(aHA—H)=H(aBA-—BI). We shall be done if 
we can select a nonzero 6 such that 8 = —Ba, but if a=e'’, then B= 
ef? willdo. O 


2.1 Unitary matrices 71 


Problems 
1. If Ue M, is unitary, show that |det U| =1. 


2. If \eo(U) and Ue M, is unitary, show that |\|=1. Hint: Use the 
isometry property (2.1.4g). 


3. Given real parameters 6), 02,...,6,, show that 
U =diag(e!!, e", vee, e"n) 
is unitary. 
4. Characterize the diagonal real orthogonal matrices. 


5. Show that the permutation matrices (0.9.5) in M, are orthogonal and 
that the permutation matrices form a subgroup (a subset which is itself a 
group) of the group of real orthogonal matrices. How many different 
permutation matrices are there in M,,? 


6. Can you give a presentation in terms of parameters of the 3-by-3 
orthogonal group? Recall the two presentations of the 2-by-2 orthogonal 
group given in the section. 


7, Provide the details for the following alternate proof that (g) im- 
plies (a) in (2.1.4). Show that (g) means that x*(U*U—J)x=0 for 
all xe C”. Let H=U*U—TI and observe that H=H™*. Consider 0= 
(x+ey)*H(x+ey) for all x, y e C” and all 0 e R. Expand this identity 
and show that x*Hy =0 for all x, ye C”. Conclude that H =0 by makir g 
systematic choices for x and y. 


8. A matrix Ae M, such that AAT =I is said to be orthogonal. A real 
orthogonal matrix is unitary, but a nonreal orthogonal matrix need not 
be unitary. (a) Let 


0 1 
k=| g EMR) 
Show that A(t) = (cosh t) + (i sinh t)K e M, is orthogonal for all że R 
but that A(t) is unitary only for £ =0. The hyperbolic functions are 
defined by cosht=(e'+e~')/2, sinht=(e'—e~')/2. (b) Show that, 
unlike the unitary matrices, the set of complex orthogonal matrices is not 
a bounded set, and it is therefore not a compact set. (c) Show that, like 
the unitary matrices, the set of complex orthogonal matrices of a given 
size forms a group. Despite this fact, common usage reserves the term 
orthogonal group for the smaller (and compact) group of real orthog- 
onal matrices of a given size. (d) If A eM, is orthogonal, show that 
|\det A| =1 but that A could have eigenvalues à with |\| #1. Hint: Con- 
sider A(t) in (a) to show that |\(¢)| can be arbitrarily large. (e) If Ae M, 


72 Unitary equivalence and normal matrices 


is orthogonal, show that A, A’, and A* are all orthogonal and that an 
nonsingular. Do the rows or columns of A form an ort ogona i a D 
Characterize the diagonal orthogonal matrices. Compare Mi rob: 
lem 4. To help avoid confusion, some authors refer to an or ' 8 a 
matrix which is not necessarily real asa complex or eer ma m nal 
is distinction i always made in the litera ; 
aan ao times means what we have called a real orthogonal matrix. 


= aot “ary. 
9. If UeM, is unitary, show that U, U’, and U* are all unitary 


10. If UeM, is unitary, show that x, ye C” are orthogonal if and only 
if Ux and Uy are orthogonal. 


11. One might call a nonsingular matrix A € M, skew-orthogonal if 
Ao = — A". Show that A is skew-orthogonal if and only if +iA is ore 
onal. More generally, if 0 € R, show that Av! =e" A" if and only if e 

is orthogonal. What is this for 0 =r? for 0=0? 


12. Show that if Ae Mn is similar to a unitary matrix, then A`! is similar 
to A*. 

13. Consider the matrix diag(2, ł)eM2 and.show that the set of ma- 
trices that are similar to unitary matrices is a proper subset of the set o 
matrices A for which A~ is similar to A*. 


14. Show that the intersection of the group of unitary matrices m 
M, with the group of complex orthogonal matrices H Mn S tre group 
" i i int: Consider U = iB, 
eal orthogonal matrices in M,,. Hint: i 
v A BeM, and A. B are real. If U is both unitary and complex orthos 
3 ? n kd a 
onal, show that B'’B=0 and hence (Be;)' (Be;) =0 for each standar 
unit basis vector €; € R”, and hence every column of B is 0. 


Further Reading. For more information about generalized aa a 
i i iti heorem (2.1.9), see.©. K. 

trices that satisfy the conditions of T e.c. 

and C. R. Johnson, “The Range of A™'A* in GL(n, ©), Linear Algebra 


Appl. 9 (1974), 209-222. 


2.2 Unitary equivalence 


Since Ut = U~! for unitary U, the transformation on M, given by 
A > U*AU is a similarity transformation if U is unitary. This special type 
of similarity is called unitary similarity or unitary equivalence. 


2.2.1 Definition. A matrix Be M, is said to be unitarily equiva e to 
A € M, if there is a unitary matrix Ue M, such that B = U*AU. If U may 


2.2 Unitary equivalence 73 


be taken to be real (and hence is real orthogonal), then B is said to be 
(real) orthogonally equivalent to A. 


Exercise. Show that unitary equivalence is an equivalence relation. 


2.2.2 Theorem. If A = [a;;] and B=[b,;]€M, are unitarily equiva- 
lent, then 


S |oj2= $ layl? 


j=l j=l 


Proof: Observe that 5; ; |a;;|? = tr A*A, by carrying out the matrix mul- 
tiplication. Thus, it suffices to check that tr B*B =tr A*A. But if B= 
U*AU, then tr B*B = tr U*A*UU*AU = tr U*A*AU = tr A*A, because the 
trace is a similarity invariant. J 


Exercise. Theoren (2.2.2) says that tr A*A is a unitary similarity invari- 
ant. Carry out another proof without using A*A, but using the fact from 
the preceding section that multiplication by a unitary matrix leaves the 
Euclidean length of a vector unchanged. Note that matrix multiplication 
from the left multiplies the columns, and matrix multiplication from the 
right multiplies the rows of a matrix. 


Exercise. Show that 


3 1 and 1 1 
—2 0 0 2 
are similar but are not unitarily equivalent. 


Since unitary equivalence implies similarity, but not conversely, the 
unitary equivalence relation partitions M, into finer equivalence classes 
than the similarity equivalence relation. Unitary equivalence, like simi- 
larity, corresponds to a change of basis, but of a special type - a change 
from one orthonormal basis to another. An orthonormal change of basis 
leaves unchanged the sum of squares of the absolute values of the entries, 
a quantity that may be changed in a nonorthonormal change of basis. 
Unitary equivalence is computationally simpler than similarity because 
the conjugate transpose is much easier to compute than the inverse. It 
also better preserves accuracy in the presence of round-off errors, and 
is therefore preferable in numerical calculations. The precise reasons for 
this are not explained here, but an intuitive explanation lies in the length- 
preserving nature of multiplication by a unitary matrix. 

Two special (and very simple) types of unitary matrices give unitary 


74 Unitary equivalence and normal matrices 


equivalence transformations that are very important in eigenvalue 
calculations. 


2.2.3 Example: plane rotations. Let U(6;/, j) 


1 i 
i } 
i 0 i 
o 
-=== cos@ 0 ... 0 —sin@ -------- |----rowi 
0 1 
= 0 : i 0 
| 
0 0 1 ! 
==- sind 0 ... 0 cos@ -------- |----row/ 
: i J 
| | 
0 
1 
t l 
i i 
column i column / 


This is simply the identity matrix, with the i, / and j, j entries replaced by 
cos @ and the i, j entry (respectively j, i entry) replaced by —sin 0 (respec- 
tively sin @). 


Exercise. Verify that U(0; i, J) is an orthogonal matrix in M,,(R) for any 
pair of indices 1<i<j<n and any angle parameter O<@<27. The 
matrix U(@;i, J) simply carries out a rotation (through an angle 0) in the 
i, j coordinate plane. Notice that left multiplication by U(6;i, j) affects 
only rows i and j and right multiplication by U(@;i,/) affects only 
columns i and j of the matrix multiplied. Thus, under unitary equiva- 
lence via U(6;i, j), only rows and columns ij and j are changed. Unitary 
equivalences via plane rotations are the basic feature of eigenvalue calcu- 
lation schemes of Jacobi and Givens (see Problems 1 and 2). 


2.2.4 Example: Householder transformations. Let we C” be a non- 
zero vector and define Uy € M, by U,,=1—tww* in which ¢ = 2(wtw) 7 
Note that ww*e M,, and w*w is a positive scalar. If w were normalized 
(w*w,=1), t would be 2 and U, would be /—2ww*. Often the matrix U,, 
is presented assuming that w is already normalized. 


Exercise. Show that U, acts as the identity on the complementary sub- 
space w+ and that it acts as a reflection on the one-dimensional subspace 
spanned by w; that is, U,x=xif xLw and Uw = —w. 


2.2 Unitary equivalence 75 


Exercise. Show that U, is both unitary and Hermitian (Už = U„). A 
matrix of the form U,, is called a Householder transformation. Unitary 
equivalence via a U, is sometimes also referred to as a Householder 
transformation. These transformations arise in a number of contexts 
including the eigenvalue calculation scheme of Householder (see Prob- 
lems 4 and 5) and other unitary reductions. Note that Householder trans- 
formations in general change all the entries of a matrix or vector to which 
they are applied, but they provide extremely efficient and accurate reduc- 
tions for a number of uses. 


Theorem (2.2.2) provides a necessary but not sufficient condition for 
two given matrices to be unitarily equivalent. It can be augmented with 
additional identities that collectively do provide necessary and sufficient 
conditions. A key role is played by the following simple notion. Let s, £ 
be two given noncommuting variables. We refer to any finite formal 
product of nonnegative powers of s, f 


Ws, t) = stg M2472 0 gk EM, Mi, Ais., Me, Ny ZO (2.2.5) 


as a word in s and t. The degree of the word W(s, t) is the nonnegative 
integer m+n, +m +m + +m,4+n,, that is, the sum of all the expo- 
nents in the word. If A e M, is given, we may formally define a word in A 
and A* as 


W(A, A*) = A™(A*)AM2(A*)"2 «+ AM AB)" 


Since the powers of A and A* need not commute, it may not be possible 
to simplify the expression of W(A, A*) by rearranging the terms in the 
product. 

If A is unitarily equivalent to some Be M,, then A = UBU* for some 
unitary Ue M, and one computes readily that 


W(A, A*) = (UBU*)" (UB*U*)"!.-- (UBU*)"* (UB*U*)"« 
= UB™U*U(B*)"1U*.--UB™*U*U(B*)"*U* 
= UB™\(B*)”! oe BIMK(B*)"KU* 
= UW(B, B*)U* 
Thus, tr W(A, A*) = tr UW(B, B*)U* = tr W(B, B*). If we take the word 
W(s,t)=ts, we obtain the identity in Theorem (2.2.2). 
If one considers all possible words W(s,t), this observation gives 
infinitely many necessary conditions for two matrices to be unitarily 
equivalent. A theorem of W. Specht, which we state without proof, 


guarantees that these infinitely many necessary conditions are also 
sufficient. 


76 Unitary equivalence and normal matrices 


2.2.6 Theorem. Two given matrices A, Be M, are unitarily equivalent 
if and only if 

tr W(A, A*) =tr W(B, B*) (2.2.7) 
for every word W(s, t) in two noncommuting variables. 

Specht’s theorem can be used to show that two given matrices are not 
unitarily equivalent, but except in special situations (see Problem 6), it 
may be useless in showing that two given matrices are unitarily equivalent 
because infinitely many conditions must be verified. Fortunately, there is 
an improvement to Specht’s theorem due to C. Pearcy that says it suffices 
to check the trace identities (2.2.7) for only finitely many words. 


2.2.8 Theorem. Two given matrices A, Be M, are unitarily equivalent 
if and only if tr W(A, A*) =tr W(B, B*) for every word W(s, t) of degree 
at most 2n?. 

The finite bound in Pearcy’s theorem is a vast improvement on Specht’s 
theorem, but even it is known to be extremely conservative. For n =2, it 
actually suffices to check the trace identities (2.2.7) for the three words 
W(s, t) =s, s*, and ¢s rather than consider all the words of degree at 
most 2(27) =8. For n=3, it suffices to check the trace identities for the 
nine words W(s, t) =s, $°, ts, $°, ts”, t?s?, tsts, ts?ts, and ts?t?s rather 
than consider all the words of degree at most 2(37) = 18. 


Problems 

1. Let A = [a;;] € M,(R) be symmetric (A7 = A), but not diagonal, 
and suppose that indices i # j are chosen so that |a;;| is as large as pos- 
sible. Define 0 by (a; —a;j)/2a;; = cot(26) and let U(0;i, j), as given in 
(2.2.3), be the resulting plane rotation. Use (2.2.2) to show that if B= 
U(6; i, j*AU(6; i, j) = (by), then LY; ;|b;;|? < Yi z;{a;;\?. Show that re- 
peated applications of such plane rotations (chosen in the same way for 
B and its successors) will decrease the sums of the squares of the off- 
diagonal entries while preserving the sums of the squares of all the en- 
tries; at each step, the matrix is “more nearly diagonal” than at the step 
before. This is the method of Jacobi for calculating the eigenvalues of a 
real symmetric matrix. This method produces a sequence of matrices that 
converges to a real diagonal matrix. Why must the diagonal entries of the 
limit be the eigenvalues of A? 


2. The eigenvalue calculation method of Givens for real symmetric ma- 
trices (or general real matrices) also utilizes plane rotations, but in a rather 
different way. Show that a symmetric matrix A = [a;;] e M, (R) is orthog- 
onally equivalent to a tridiagonal (symmetric) matrix via plane rotations 
and that a general A e M,,(R) is orthogonally equivalent to (lower) Hes- 
senberg form via plane rotations. See (0.9.9) and (0.9.10) for tridiagonal 


2.2 Unitary equivalence 77 


and Hessenberg matrices. Hint: Choose a plane rotation U;,; of the form 
U(6; 2, 3) in which (2.2.9b) is used to choose @ so that the 1,3 entry of 
Us, AU, 3 is 0. Choose another plane rotation of the form U(6; 2, 4) and 
continue in this way to zero out the rest of the first row. Then start on the 
second row beginning with the 2, 4 entry. Choose a plane rotation of the 
form U(0; 3, 4) to zero out the 2, 4 entry and so on. Note that this process 
does not disturb previously manufactured 0 entries. Note also that or- 
thogonal equivalence preserves symmetry. The characteristic polynomial 
of a tridiagonal matrix may be determined easily and the eigenvalues de- 
termined by a root-finding scheme. Notice that Givens’s method produces 
a tridiagonal matrix after finitely many plane rotations, but it does not 
display the eigenvalues or the eigenvectors, which must then be obtained 
from some further calculation. Jacobi’s method does not, in general, ter- 
minate after finitely many plane rotations, but it tries to produce a diago- 
nal matrix as well as an orthonormal set of eigenvectors. 


3. Show that every matrix A e M, is unitarily equivalent to a matrix with 
equal main diagonal entries. Hint: (a) If A e Ma, consider A — (1/2) (tr A) 
to show that it suffices to consider only the case tr A = 0. If xe C? is a unit 
vector such that x*Ax =0, let U=[x y]eM, be unitary, show that the 
1, lentry of U*AU is zero, and use the trace condition to show that the 2, 2 
entry is zero as well. To find such a vector x, let w, z be unit eigenvectors 
associated with the two eigenvalues +) of A. If A =0 take x =w. If \ #0 
let x(0) = e"?w+z; show that x(0) #0 for all 0eR, and x(0)*Ax(0) =0 
for some 0ER; let x = x(6)/[x(6)*x(0)]!/? for this 6. Remark: If Ae 
M,(R), it is easy to construct a (real) plane rotation U = U(6;1, 2) that 
makes the diagonal entries of U7AU equal, but this does not help in the 
complex case. (b) For A = [a;;] € M,, define f(A) = max{|a;;—a;;|: i, j= 
1,2,...,7} and let A, =| ai a | for a pair of indices i, j for which f(A) = 
la; —a;j|. Let U€ M, be a unitary matrix such that U}A,U, has equal 
main diagonal entries. Construct U(i, j) eM, from U; in the same way 
that U(6@; i, j) was constructed from a 2-by-2 plane rotation in (2.2.3) and 
show that f(U(i, /)*AU(i, /)) < f(A) if there are not ties for the maxi- 
mum absolute diagonal difference; if there are ties, this construction can 
be repeated. Conclude that if f(A) #0, there is a unitary Ue M, such 
that f(U*AU) < f(A). Let R(A) = {U*AU: U e M, is unitary}. Show that 
R(A) is compact and note that f is a continuous function on R(A). Let 
Ce R(A) be such that (C) = min{ f(B): Be R(A)}. Show that f(C) >0 
is impossible and that the assertion follows from /(C) =0. 


4. Show that any vector x eR” with Euclidean length r = (x7x)!/? may 
be transformed into any other vector y e R” of length r, y Æx, by a House- 
holder transformation, that is, y= U,,x for some we R”. Hint: Try w= 
x—y. What can you say if x, yeC”? 


78 Unitary equivalence and normal matrices 


5. Householder’s method for calculating eigenvalues of A e M,(R), like 
the method of Givens, first reduces A to (upper) Hessenberg form (or tri- 
diagonal form in the symmetric case). Show constructively that a matrix 
of the form 


* * x * 
* k 
ł 
k ë ! 
j 
* ' * 
i 
0 l 
* oi -— rowk 
* + rowk+1 
0 1 o% 
{ 
T 
column k 


with 0’s below the (j+1)st entry in the jth column, j =1,...,k, may be 
transformed to a matrix of the same form, j =1,...,k+1, via a single 
real orthogonal similarity using a Householder transformation. Con- 
clude that any matrix A e M,,(R) may be reduced to upper Hessenberg 
form by a sequence of (1—2) Householder similarities and that a sym- 
metric matrix A e M,,(R) may be reduced to tridiagonal form in the same 
way. Hint: For the (k+1)st column, choose a Householder transfor- 
mation Ve M,_,_, that transforms the (1—k—1) vector of entries oc- 
curring below the diagonal to an appropriate multiple of [1,0,...,0]7e 
R”-4-1, Then transform the entire matrix by a similarity via the orthog- 
onal matrix 


I Oly 
o [eM 


and see that the desired 0 pattern prevails. 


6. Let AEM, and B,CeEM,, be given. Use either Specht’s theorem 
(2.2.6) or Pearcy’s theorem (2.2.8) to show that B and C are unitarily 
equivalent if and only if any one of the following conditions holds: 


(a) [3 | and [4 el are unitarily equivalent. 


B 0 C 
(b) | 0 n A and | 0° o] are unitarily equivalent, where both 


direct sums contain the same number of terms. 


2.3 Schur’s unitary triangularization theorem 79 


A A 
(c) | o | and | o° L are unitarily equivalent, where both 
direct sums contain the same number of terms. 


7. Show that there are 2*~! distinct words W(s, t) of the form (2.2.5) 
that have a given degree k and conclude that there are at most 4” ? distinct 
words of degree at most 2n’. 


8. Give an example of two 2-by-2 matrices that satisfy the identity 
(2.2.2) but are not unitarily equivalent. Explain why. 


Further Readings and Notes. For the original proof of Theorem (2.2.6), 
see W. Specht, “Zur Theorie der Matrizen II,” Jahresbericht der Deutschen 
Mathematiker Vereinigung 50 (1940), 19-23. Theorem (2.2.8) is proved 
in C. Pearcy, “A Complete Set of Unitary Invariants for Operators Gen- 
erating Finite W*-Algebras of Type I,” Pacific J. Math. 12 (1962), 1405- 
1416. A discussion of unitary equivalence of low-order matrices is in 
C. Pearcy, “A Complete Set of Unitary Invariants for 3 x 3 Complex 
Matrices,” Trans. Amer. Math. Soc. 104 (1962), 425-429. 


2.3 Schur’s unitary triangularization theorem 


Perhaps the most fundamentally useful fact of elementary matrix theory 
is that any matrix A e M, is unitarily equivalent to an upper triangular 
matrix T [and also to a lower triangular matrix]. The diagonal entries 
of T are, of course, the eigenvalues of A. Although this form is far 
from unique, it represents the simplest form achievable under unitary 
equivalence. 


2.3.1 Theorem (Schur). Given Ae M, with eigenvalues \j,..., An in 
any prescribed order, there is a unitary matrix Ue M, such that 


U*AU =T= [ti] 
is upper triangular, with diagonal entries f; =), i=1,...,n. That iS, 
every square matrix A is unitarily equivalent to a triangular matrix whose 
diagonal entries are the eigenvalues of A in a prescribed order. Further- 
more, if Ae M,,(R) and if all the eigenvalues of A are real, then U may 
be chosen to be real and orthogonal. 


Proof: The proof is algorithmic and proceeds by a sequence of reduc- 
tions of like type. Let x" be a normalized eigenvector of A associated 
with the eigenvalue \,. The nonzero vector x” may be extended to a basis 


l 2 
xí yy! yy) vey YP 


80 Unitary equivalence and normal matrices 


of C”. Apply the Gram-Schmidt orthonormalization procedure (0.6.4) 
to this basis to produce an orthonormal basis 


yee ZO) 


of C”. Array these orthonormal vectors left to right as the columns of a 


unitary matrix U. Since the first column of AU, is \,x"), a calculation 
reveals that U/(AU,) has the form 


The matrix A;¢M,,_; has eigenvalues \>,...,A,. Let x eC"! be a 
normalized eigenvector of A; corresponding to \2, and do it all over 
again. Determine a unitary U,eM,,_, such that 


and let 


The matrices Vz and U, V> are then unitary, and VU;'AU, V2 has the form 


Ay * 


0 
VIUZAU,V, = |_2 A 


0 |4 


Continue this reduction to produce unitary matrices U;e Mp-14p i= 
l,... n—1 and unitary matrices V;e Mn, i=2,...,n—1. The matrix 


U =U, Vav Vp- 


is unitary and U*AU yields the desired form. 

If all eigenvalues of Ae M,(R) happen to be real, then the corre- 
sponding eigenvectors can be chosen to be real and all the above steps 
may be carried out in real arithmetic, verifying the final assertion. (] 


Remark: Follow the proof of (2.3.1) to see that “upper triangular” 
could be replaced by “lower triangular” in the statement of the theorem 
with, of course, a different unitary equivalence U. 


2.3.2 Example. Neither the unitary matrix U nor the triangular matrix 
T of Theorem (2.3.1) is unique. Not only may the diagonal entries of T 


2.3 Schur’s unitary triangularization theorem 81 


(the eigenvalues of A) appear in any order, but unitarily equivalent upper 
triangular matrices may appear very different above the diagonal. For 
example, 


1 1 4 2-1 3V2 
T,=|0 2 2] and 7,=|0 1 v2 
0 0 3 0 0 3 


, {i 1 0 
=—-|1-1 0 
2 

Io o vz 


In general, many different upper triangular matrices can be in the same 
unitary equivalence class. 

Remark: Notice that the technique of the proof (2.3.1) is simply that 
of sequential deflation, as outlined in Problem 8 in Section (1.4). 


Exercise. If Ae M, is unitarily equivalent to an upper triangular matrix 
T=[t;]E€ Mn, the entries ¢;; are not uniquely determined, but the quan- 
tity Dic; [ti Al? is uniquely determined. Determine the value of Yj <; |t; Al 
in terms of the entries and eigenvalues of A. Hint: Use (2.2.2). 


Exercise. If A= [a;j] and B = [b;;]€ Mp are similar and if Ð; ; |a;j|? = 
X; ; |bi;|°, show that A and B are unitarily equivalent. Show by example 
that this is not the case in higher dimensions. Hint; Notice that if A and B 
are unitarily equivalent, then so are A+ A* and B+ B*. Consider 


13 0 1 
A=!0 2 4 and B= 0 
0 0 3 0 


It is a useful adjunct to (2.3.1) that a commuting family of matrices 
may be simultaneously upper triangularized. 


2.3.3. Theorem. Let F < M, be a commuting family. There is a unitary 
matrix Ue M, such that U*AU is upper triangular for every Ae S. 


Proof: Return to the proof of (2.3.1). Exploiting (1.3.17) at each step 
of the proof in which a choice of an eigenvector (and unitary matrix) 
is made, the same eigenvector (and unitary matrix) may be chosen for 
every Ae ¥. Moreover, unitary equivalence preserves commutativity, 


82 Unitary equivalence and normal matrices 


and a partitioned multiplication calculation reveals that, if two matrices 
of the form 


f A | and 5 By l 

commute, then Az. and B> commute also. Thus, the commuting family 
property is inherited by each A; at each stage in the reduction process of 
the proof of (2.3.1). We conclude that all ingredients in the U of (2.3.1) 
may be chosen in the same way for all members of a commuting family, 
thus verifying (2.3.3). Notice that we do not claim that any special order 
may be chosen for the eigenvalues of the various family members. We 
simply take them as they come using (1.3.17). D 


A strictly real version of (2.3.1) is contained in the following theorem. 


2.3.4 Theorem. If Ae M,(R), there is a real orthogonal matrix Q € 
M,,(R) such that 


A, * 


A, 


e M,,(R), lsksn (2.3.5) 


Q'AQ = l 
0 A, 


where each A; is a real 1-by-1 matrix, or a real 2-by-2 matrix with a non- 
real pair of complex conjugate eigenvalues. The diagonal blocks A; may 
be arranged in any prescribed order. 

One cannot, in general, hope to reduce a real matrix to upper trian- 
gular form by a real similarity (let alone a real orthogonal similarity) 
because the diagonal entries would then be eigenvalues, which could be 
nonreal. The form (2.3.5) is the most nearly triangular form one can 
achieve by a real orthogonal similarity. It will not be upper triangular if 
A has any nonreal eigenvalues, but it will always be in upper Hessenberg 
form. 


Exercise. Modify the argument for (2.3.1) to prove (2.3.4). Hint: If isa 
real eigenvalue of the real matrix A, then there is a corresponding real 
eigenvector that can be used to deflate A as in (2.3.1). If \=atiGisa 
nonreal eigenvalue for A and if Ax=)\x, x=u+iv#0, u,vER", show 
that Au=au—fv, Av=av+ Bu, and AX=)<X, and that {x,X} is an 
independent set. Deduce that {u, v} is an independent set and apply the 
Gram-Schmidt procedure to it to obtain a real orthonormal set {w, z}. 


2.3 Schur’s unitary triangularization theorem 83 


Let Q; be a real orthogonal matrix whose first two columns are w and z. 
Show that 


so that A may be deflated two columns at a time in this case. Notice that 

blocks A; corresponding to each real eigenvalue and each pair of complex 

conjugate eigenvalues can be arranged in any prescribed order in (2.3.5). 
There is also a real version of (2.3.3). 


2.3.6 Theorem. Let J © M,(R) be a commuting family. There is a real 
orthogonal matrix Q e M,,(R) such that O'AQ is of the form (2.3.5) for 
every AES. 


Exercise. Modify the proof of (2.3.3) to prove (2.3.6). Hint: First deflate 
all members of ¥ using all the common real eigenvectors. Then consider 
the common nonreal eigenvectors and deflate two columns at a time as in 
the proof of (2.3.4). Notice that different members of F may have different 
numbers of 2-by-2 diagonal blocks after the common real orthogonal 
similarity, but if one member has a 2-by-2 block in a certain position and 
another member does not, then the latter must have a pair of equal 
1-by-1 blocks there. 


Problems 
1. Let xeC” be a given unit vector (x*x=1) and write x= [x y"]", 
where xe C and yeC""'. Choose 0ER such that e'’x,>0 and define 
z=eľx=[z; ETN, where zie R is nonnegative and fe C”~'.'Show that 
the matrix 


is unitary. Hint: Compute V*V = V*. Conclude that the matrix U = 
e ’V=[x uh ... Un} is a unitary matrix whose first column is the given 
vector x. This gives a constructive method to obtain the unitary ma- 
trices needed for the successive deflation steps in the proof of Schur’s 
theorem (2.3.1). 


84 Unitary equivalence and normal matrices 2.4 Some implications of Schur’s theorem 85 


2. If xe R” is a given unit vector, show how to streamline the con- 
struction described in Problem 1 to produce a real orthogonal matrix 
QeM,,(R) whose first column is x. Prove that your construction works. 


2-by-2 complex orthogonal matrices with both eigenvalues different from 
+1. Show that none of these matrices can be reduced to upper triangular 
form by orthogonal similarity. 


3. Let Ae M,(R). Explain why the nonreal eigenvalues of A (if any) 


must occur in conjugate pairs. Further Reading. For a proof of the stronger form of Theorem (2.3.3) as- 


serted in Problem 5, see Y. P. Hong and R. A. Horn, “On Simultaneous 
Reduction of Families of Matrices to Triangular or Diagonal Form by 
Unitary Congruences,” Linear and Multilinear Algebra 17 (1985), 271-288. 


4. Consider the family 


0 —1 1 1 
F = 
ilo afle- 
and show that the hypothesis of commutativity in Theorem (2.3.3), while | 


sufficient to imply simultaneous unitary upper triangularizability of F, is | 
not necessary. 


2.4 Some implications of Schur’s theorem 


Several elementary consequences of Schur’s unitary triangularization 
illustrate its utility. 


5. Let F ={A,,..., Ay} CM, be a given family, and let 
G =[(A,A;:i, 7 =1,2,...,k} 


be the family of all pair-wise products of matrices in F. It is a fact that if 
is commutative, then F can be simultaneously unitarily upper triangu- 
larized if and only if every eigenvalue of every commutator A;A;—A,A,_ 
is zero. Show that assuming commutativity of G is a weaker hypothesis | 
than assuming commutativity of F. Show that the family F in Problem 4 
has a corresponding G that is commutative, and that it also satisfies the 
zero eigenvalue condition. 


Exercise. Use (2.3.1) to show that if Ae M, has eigenvalues \,,..., Ans 
counting multiplicity, then det A=JT7_,\; and trA=D"_,},. Recall 
that this was proved in another way in Chapter 1. Hint: For the trace, 
recall that tr AB=tr BA follows from a direct calculation, so that the 
trace is similarity invariant. What about the other elementary symmetric 
functions of the eigenvalues? 


The fact that every matrix satisfies its own characteristic equation 
(2.4.2) follows from Schur’s theorem and a simple observation about 


, multiplication of triangular matrices. 
6. Let A,BeM, be given, and suppose A and B are simultaneously | P 8 


similar to upper triangular matrices; that is, ST!AS and S~'BS are both 
upper triangular for some nonsingular Se M,,. Show that every eigen- 
value of AB—BA must be zero. Hint: If A, A2€ M, are both upper | 
triangular, what is the main diagonal of A, A,—A,A,? 


2.4.1 Lemma. Suppose that R=[r;;] and T=([t;;]e€M, are upper 
triangular and that r;;=0, 1<i,j/<k<n, and teaik4)=90. Let T= 
(ti;]= RT. Then ¢/,=0, 1si, jsk +1. 


7. Although every square matrix can be reduced to upper triangular 
form by unitary similarity, this is not true for complex orthogonal simi- 
larity. If a given Ae M, can be written as A= QAQ", where Q eM, is 
complex orthogonal and Ae M, is upper triangular, show that A has at 
least one eigenvector xe C” such that x’x 0. Consider A= f || to 


show that not every Ae M, can be upper-triangularized by a complex | 
orthogonal similarity. 


` Proof: Since R({1,2,...,k}) =Oand teat.e41=0, Rand T have the form 
T i 
i * ] 


8. Let Oe M,, be a given complex orthogonal matrix, and suppose x e C” 
is an eigenvector of Q associated with an eigenvalue \ ~ +1. Show that 
x'x=0. Hint: Multiply both sides of the identity Ox =x by its trans- 


here both l i iti -by- 
pose. See Problem 8(a) of Section (2.1) for an example of a family of i, upper left blocks in the partitions are k-by-k. The upper left 


k-by-k block of T” is clearly 0 by partitioned multiplication [see (0.7) for 


2.4 Some implications of Schur’s theorem 87 
= U[(T—\,I)(T— dat) ++ (TX, 2) U* 
= Up,4(T)U" 


and notice that p4(A) =0 if and only if p4(7)=0. However, Lemma 
(2.4.1) allows us to conclude that p(T) =0. The upper left 1-by-1 block 
of T—),/ is 0, and the 2, 2 entry of T— M1 is 0; since both are upper 


86 Unitary equivalence and normal matrices 


notation and elementary facts]. Also, inspection reveals that the first { 
k+1rows of R have 0’s in all nonzero positions of column k +1 of T, and | 
that the first K +1 columns of T have 0’s in all nonzero positions of row | 
k+1 of R. Matrix multiplication then shows that 7’ (partitioned in the , 
same way) has the form 


m 7 

! 0 x triangular, the upper left 2-by-2 block of (T-M J)X(T— ^21) is 0. Induc- 
0 po tively, since the upper left k-by-k block of (T—\,/)---(T7—A,7) and the 
0 k+l, k+l entry of (T—dx,4,/) are 0 and both are upper triangular, the 
T” = [ooo ooo upper left (k+1)-by-(k +1) block of (T-M) (T—ħg+1 7) is 0. Con- 
x tinuation until n is reached allows us to conclude that the product p4(T) = 

0 : .. * (T-N) (T—ħ„I)=0, which completes the proof. C 
0 0 * | Exercise. What is wrong with the following argument for the statement 


ben i 


and 7’({1, ...,k+1}) =0, as was to be shown. g p4(A) =0? “Since pa(à) = 0 for every eigenvalue \ of A e M,,, and since 


the eigenvalues of g(A), q a polynomial, are the q(A), it follows that all 
eigenvalues of p4(A) are 0. Therefore, p4(A) is 0.” This is a common 
mistaken argument for the Cayley-Hamilton theorem. Give an explicit 
example to illustrate where it is in error. 


Exercise. Show that the product of two upper triangular matrices i 
upper triangular, and show that the product of two similarly partitione 
block upper triangular matrices is block upper triangular. 


Exercise. Generalize (2.4.1) by showing that if R and T are upper trian 
gular and 7’= RT, then 


T'({iitl, f+ A) HRI, FFA) TUG FL, ETS) 


Exercise. What is wrong with this argument? “Since p4(f) =det (4 —A), 
p4(A) = det(AI— A) = det(A—A) =det 0 =0. Therefore, p,(A) =0.” 


If p4(t) = det (tJ —A) denotes the characteristic polynomial of A € Mp, 
the characteristic equation is p4(t)=0. The roots of the characteristic 
equation are the eigenvalues of A. The Cayley-Hamilton theorem is 
often paraphrased as “every square matrix satisfies its own characteristic 
equation,” but this must be understood carefully: The scalar polynomial 
pa(t) is first computed as p4(t)=det(t7—A), and one then forms the 
matrix p4(A) from the characteristic polynomial. 

We have proved the Cayley-Hamilton theorem for matrices with 
complex entries, and hence it must hold for matrices whose entries come 
from any subfield of the complex numbers (the reals or the rationals, for 
example). In fact, the Cayley-Hamilton theorem is a completely formal 
result that holds for matrices whose entries come from any field or, more 
generally, any commutative ring. See Problem 3. 

One important use of the Cayley-Hamilton theorem is to write powers 
A’ of AeEM,, for k=n, as linear combinations of J, A,A’,...,A"™'. 
By a linear dependence argument, it is easy to show (since the dimension 
of Mn, considered as a vector space over the complex numbers, is n 2) 
that powers A”? and beyond can be expressed as linear combinations of 
lower powers, but the Cayley-Hamilton theorem provides a notable 
improvement. 


2.4.2 Theorem (Cayley-Hamilton). Let p4(t) be the characteristic | 
polynomial of Ae M,,. Then | 


pa(A)=0 


Proof: Since p4(t) is of degree n with leading coefficient 1 and the roots} 
of p4(t) =0 are precisely the eigenvalues )j,..., An of A, counting multi-{ 
plicity, we may factor p(t) as 


pat) =(t—dj) (ta) +++ (LAr) 
Using (2.3.1), write A as 
A=UTU* 


where T is upper triangular with ; in the ith diagonal position, i/=1,..., 
Now compute 


pa(A) = pa(UTU*) = (UTU*—\\1) (UTU* — X21) ---(UTU*—X,T) 
= [U(T—\1)U*][U(T—d2I)U*)} [UT Ant) U") 


88 Unitary equivalence and normal matrices 


2.4.3 Example. Let 


3 1 
A=| ol 

Then py(t) =t?—3¢+2, and A?—3A+2/=0. Thus, A? = 3A — 21; 
A = A(A’) = 34? —2A = 3(3A—21)-2A = 7A-61; A =74—6A 
154 —14/, and so on. Also, since the constant term in pa(t), the deter- 
minant of A, is nonzero, A is nonsingular, and we may write A`' asa 
polynomial in A. Again from p4(A) = A’—34+4+2/=0, we get 2/ 
—~A?4+3A=A(—-A+3/), or 


I=A[4(—A+3/)] 


This means that A~! = —4A4+3/= È 3/2 | 


Exercise. Given A € M, with characteristic polynomial 
palt)= t" +a, t"! +ap2t"7? + +> +a + ao 


write A” as a polynomial, of degree at most n—1, in A. Do the same for 
the next few powers of A after n—1. Assume further that A is nonsin- 
gular (ao #0), and write A”! as a polynomial, of degree at most n—1, in 
A. We record this latter fact as a corollary to (2.4.2). 


2.4.4 Corollary. If A€ M, is nonsingular, then there is a polynomial 
q(t) (whose coefficients depend upon A), of degree at most n—1, such 
that A7'=q(A). 


Exercise. If two matrices A, B e M, are similar, show that any polynomial 
evaluated at one is similar to the same polynomial evaluated at the other, 
and, in particular, any polynomial equation satisfied by one is satisfied 
by the other. Give some thought to the converse: Satisfaction of the same 
polynomial equations implies similarity - true or false? 


2.4.5 Example. We have shown that every matrix A €M, satisfies 
some polynomial equation of degree n. The characteristic polynomial 
may be used, for example. It is possible for Ae M, to satisfy a poly- 
nomial equation of degree less than n, however. Let 


1 0 0 
A= 0 1 1 e M, 
0 0 1 


Then A satisfies g(A) =0, where q(t) = t?—2t +1 has degree 2. 


2.4 Some implications of Schur’s theorem 89 


Exercise. Show that a diagonalizable matrix satisfies a polynomial equa- 
tion of degree equal to the number of its distinct eigenvalues, and no less. 
The (monic) polynomial of minimum degree that a matrix satisfies - its 
minimum polynomial - will be a subject of further study in the next 
chapter in connection with the Jordan canonical form. Hint: Consider 
q(t) =(t—d 1) ---(t—d;) with hj; Fj. 


Another use of Schur’s result is to make it clear that every matrix is 
“almost” diagonalizable in two possible interpretations of the phrase. 
The first says that arbitrarily close to a given matrix there is a diagonal- 
izable matrix, and the second says that any given matrix is similar to an 
upper triangular matrix whose off-diagonal entries are arbitrarily small. 


2.4.6 Theorem. Let A= [a;;]e M„. For every e>0, there exists a 
matrix A(e) =[a;;(€)]eM, that has n distinct eigenvalues (and is there- 
fore diagonalizable) and is such that 

n 

5 |a;;—aj;(€)|* <E 

ij=l 

Proof: Let Ue M, be unitary and such that U*AU =T is upper trian- 
gular. Let E = diag(e;, e2, ..., €„) in which e,..., €, are numbers chosen 
so that 


TO 


and so that the numbers f1 +6, f22+€2,..., Inn +, are distinct. (Reflect 
for a moment to see that this can be done.) Then T+E has n distinct 
eigenvalues: fi +e, .., fnn ten, and so does A+ UEU*, which is similar 
to T+E. Let A(e)=A+UEU*, so that A—A(e) = —UEU* and 


È laij—aij(6)|° = y jel?<n(~) =€ 


ty] i=l] 


We have used (2.2.2). Therefore, A(e) satisfies the assertions of the 
theorem. O 


Exercise. Show that the condition Ð, ; |a,,—a,;(e)|?<e in (2.4.6) could 
be replaced by max,, ;|@;;—a;;(€)| <e. Hint: Apply the theorem with e? in 
place of e and realize that, if a sum of squares is less than e?, each of the 
items must be less than e in absolute value. 


2.4.7 Theorem. Let AeM,,. For every e>0 there is a nonsingular 
matrix S, € M, such that 


90 Unitary equivalence and normal matrices 
S'AS: == [ti;(e)] 


is upper triangular and |¢;;(e)|<¢ forlsi<jsn. 


Proof: First apply Schur’s theorem to produce a unitary matrix Ve M, 
and an upper triangular matrix T € M, such that 


U*AU=T 


Define D,, = diag (1, a, a”, a” !) for a nonzero scalar œ and set f= 
max; <;|t;;|. Assume that «<1, since it certainly suffices to prove the 
statement in this case. If ¢ <1, let S; = UDe, and, if t > 1, let Se = UD, Dy. 
In either case, the appropriate S, substantiates the claim of the theorem. 
If £ <1, for example, a simple calculation reveals that ¢;;(€) = t;j€ “lel = 
ti je -i whose absolute value is no more than e/~', which is, in turn, no 
more than e if i<j. If ¢>1, on the other hand, the similarity by Di; 
simply preprocesses the matrix, producing one in which all off-diagonal 


entries are no more than 1 in absolute value. g 


Exercise. Prove the following variation upon (2.4.7): If A € M, and e >Q, 
there is a nonsingular S,<¢M, such that S'AS. =T, = [tij(e)] is upper 
triangular and Ð; >; |tij(€)|<e. Hint: Apply (2.4.7) with [2/n(n—1)]e in 
place of e. 


An extension of Schur’s theorem, easily proved from it, is an impor- 
tant step toward the Jordan canonical form, to come in the next chapter. 


2.4.8 Theorem. Suppose that Ae M, has eigenvalues ); with multi- 


plicity nj, i=1,...,k, and that \j,..., A are distinct. Then A is similar to | 


a matrix of the form 


Tı 0 
Tz 


0 T, 


where T;e M,, is upper triangular with all diagonal entries equal to \;, 
i=1,..., k. If Ae M,(R) and if all the eigenvalues of A are real, then the 
same result holds, and the similarity matrix may be taken to be real. 


Proof: First apply Schur’s theorem (2.3.1) to exhibit (unitary) simi- 
larity to an upper triangular matrix T= [t,;], and suppose that we have 
arranged that all the \, terms come first, the Az terms next, and so on, on 


2.4 Some implications of Schur’s theorem 91 


the diagonal of T. We next perform a sequence of simple (nonunitary) 
similarities on T that produce the desired above-diagonal 0’s, without 
changing the diagonal or the upper triangular structure of T. Let E,, be 
that matrix in M, with a 1 in the r,s position and 0’s elsewhere. Notice 
that, for r =s and & any scalar, /+aE,, is nonsingular and (J+ a@E£,,;) “I 
I—aE,,. Furthermore, straightforward calculation reveals that the simi- 
larity by J+ aE,, forr<s, 


(I+ aE,s) 'T(I + aF,;) = U- aEn) T+ ak;s) 


changes entries of T only in the rth row, to the right of column s, and in 
the sth column above the rth row, 


td 
and it replaces ¢,, by 

brs t+ a(t, —tss) 
Thus, if t, = ts, the r,s entry may be made 0 by choosing 


rs 
(t+ —tss) 


without otherwise altering the relevant structure. Now consider the 
sequence of positions: (n—1, n); (n—2,n—1), (n—2,n); (n—3,n—2), 
(n—3,n—1), (n—3,n); (n—4,n—3)... in T. Make each of these 0, in 
turn, via a similarity of the indicated sort, if t,,#t;;, and notice that no 
previously created 0 entry will be disturbed. The resulting matrix will be 
similar to A and will have the desired form. C 


Q 


Exercise. Show that if Ae M,(R) and all its eigenvalues are real, all 
operations necessary in the proof of (2.4.8) may be carried out in real 
arithmetic. Thus, in this event, the block diagonal matrix guaranteed by 
the theorem, and the similarity necessary to achieve it, may be taken to 
be real. 


Remark: Suppose a given matrix A e M, is upper triangular and, by a 
permutation similarity if necessary, suppose it has been reduced to the 
form 


92 Unitary equivalence and normal matrices 


in which each diagonal block Aj; is upper triangular and has only 4; on 
the diagonal, and suppose A; # Nj if ix j. The algorithm used to prove 
Theorem (2.4.8) shows that A is similar to 


Ay 0 

0 Axx 
That is, in this situation, all the off-diagonal blocks may be replaced with 
zero blocks and similarity is preserved. Because unitary similarity pre- 


serves the sum of squares of the absolute values of the entries, notice that 
it is not possible to accomplish this result with a unitary similarity if any 


off-diagonal block Aj; is nonzero. l 
We now use the commuting families version (2.3.3) of Schur’s 


theorem to show that the eigenvalues “add” - in some order - for com- 
muting matrices. 


2.4.9 Theorem. Let A, Be M, have eigenvalues a, = , &n and Bi, wees Bry 
respectively. If A and B commute, there is a permutation ij, ...,/, of the 
indices 1, ..., n, such that the eigenvalues of A+ Bare ay + Bi,, 2+ bis +s 
a, + Bi,- In particular, o(A+B) Go(A)+o(B) if Aand B commute. 


Proof: If A and B commute, they may be simultaneously upper-triangu- 
larized according to (2.3.3); that is, there is a unitary Ue M, such that 
U*AU=T and U*BU=R 
are both upper triangular, with diagonal entries a1, -s Qn and Bis --+ Bins 
respectively. Observe that 
U*(A+B)U=T+R 
has diagonal entries and therefore eigenvalues 
ay + Bi,, 2+ Bigs «+> Qn + Gi, 
These must also be the eigenvalues of A+B since A+B is similar to 
T+R. O 


2.4.10 Example. Note that, even when A and B commute, not neces- 
sarily all numbers of the form a; + B; occur as eigenvalues of A+ B. Con- 
sider the diagonal matrices 


1 0 _f[3 0 
A=|, | and a=, A 


2.4 Some implications of Schur’s theorem 93 


and realize that 1+4 =5 € {4, 6] =0(A+ B). Thus o(A+B) is contained 
in, but is not generally equal to, o(A)+o(B) when A and B commute. 


2.4.11 Example. If A and B do not commute, it is difficult to say much 
about o(A + B) in terms of o(A) and o(B). In particular, o(A+ B) need 
not be contained in o(A)+0(B). Let 


0 1 0 0 
A=|) o| and B=|, o 


Then o(A+B)={—1,1}, while o(A) = 0(B) = {0}. 


2.4.12 Example. Is the converse of (2.4.9) true? If the eigenvalues of A 
and B add, in some order, need A and B commute? The answer is no, 
even if the eigenvalues of œA and BB add, in some order, for all scalars a 
and 8. This is an interesting phenomenon, and the characterization of 
such pairs of matrices is an unsolved problem! Let 


1 4 5 2 1 2 
A=;0 2 6; and B=/0 3 3 
0 0 3 0 0 4 


The eigenvalues add, but A and B do not commute. Clearly, simulta- 
neous upper triangularizability is sufficient for the additivity of eigen- 
values, but it, too, is not necessary. And, of course, upper triangular 
matrices need not commute. 


2.4.13 Corollary. Suppose that A, Be M, are commuting matrices with 
eigenvalues ay,...,a@, and (),...,8,, respectively. If aj#—6;, i, j= 
1,...,, then A+B is nonsingular. 


Exercise. Verify (2.4.13) using (2.4.9). 


Exercise. Show that for any pair A, Be M, (commuting or noncom- 
muting) the sum of the eigenvalues of A+ Bis the sum of the eigenvalues 
of A plus the sum of the eigenvalues of B. Hint: What is tr(4 +B)? 


We have considered simultaneous diagonalization of diagonalizable 
matrices, for which commutativity is an easily verified necessary and 
sufficient condition. We have also considered simultaneous triangulariza- 
tion, for which commutativity is a sufficient but not necessary condition. 
Since it is sometimes useful to be able to show that two given matrices 
cannot be simultaneously triangularized, we are interested in stronger 


94 Unitary equivalence and normal matrices 


necessary conditions than additivity of the eigenvalues. The following 
example points the way toward such conditions. 


2.4.14 Example. Let 


0 1 0 0 0 0 
A=z=|0 0-1 and B=/1 0 0 
00 0 0 1 0 


Both A and B have the eigenvalue 0 with multiplicity 3, as does any 
linear combination aA +bB, so the eigenvalues add and there is, on 
these grounds, reason to believe that A and B might be simultaneously 
triangularizable. But if there were some nonsingular Se M3 such that 
SAST! and SBS~! were both upper triangular, then the eigenvalues of 
(SAS!) (SBS ~!) = SABS ~! would have to be products, in some order, of 
the eigenvalues of A and B. But the set of eigenvalues of AB is {—1,0, 1}, 
which is not contained in the set product of {0} and {0}. We conclude 
that A and B are not simultaneously upper triangularizable. 


Exercise. Verify the assertions in the preceding example, including the 
fact that if C,DeM, are both upper triangular, then the eigenvalues 
of CD are products of eigenvalues of C and D in some order; that is, 


o(CD) EG a(C)a(D). 


Simultaneous upper triangularizability by a not necessarily unitary 
similarity [but see Section (2.6)] is completely characterized by the fol- 
lowing theorem of McCoy, whose proof we omit. Recall that we may 
speak of a polynomial in any number of variables; it is simply a linear 
combination of products of powers of the variables. If the variables are 
noncommuting, different powers of the same variables may occur in a 
given product with products of powers of other variables in between. 


2.4.15 Theorem. Let A,BeM,, with o(A) ={ay,...,Q@,} and o(B)= 
{B;,.--» Bn}, including multiplicities. There is a nonsingular Se M, such 
that both S~!AS and S~'BS are upper triangular if and only if there 1s a 
permutation i}, -in Of the indices 1,2,..., 7 such that o(p(A, B)) = 
(plaj, Bi)? J=1, ...,n} for all polynomials p(?, 5) with complex coefh- 
cients in two (noncommuting) variables. 


Exercise. Verify that the polynomial condition in (2.4.15) is necessary if | 


A and B are simultaneously triangularizable. Show that, if A,BeM, 
commute, then o( p(A, B)) ={p(aj, Bi): J = 1,..., 7} for all polynomials 
p in two variables. How is Example (2.4.14) covered by the theorem? 


2.4 Some implications of Schur’s theorem 95 


Remark: The result of Theorem (2.4.15) is valid for matrices and 
polynomials over an arbitrary field as long as the field contains the eigen- 
values of the matrices; it holds for simultaneous triangularization of 
k =3,4,... matrices (the condition then involves polynomials in K vari- 
ables); and it even holds for a restricted subset of the eigenvalues, that is, 
plaj, Bi) € o( P(A, B)), J=1,...57, for all polynomials p(s,t) if and 
only if A and B are simultaneously similar to block triangular matrices 
with ay,...,a@, and 6;,,...,8;, comprising 1-by-1 blocks in corresponding 
positions somewhere on the diagonals of the block triangular matrices 
that are simultaneously similar to A and B, respectively. 


Problems 


1. Suppose A,BeM, commute and have eigenvalues a,,...,a, and 
Bis- Bn, respectively. (a) Show that the eigenvalues of AB are a, §;,, 
a2Bi,,---,%,8i, for some permutation /,...,/, of the indices 1,..., n. 
(b) If p(t, s) is a polynomial in two variables, show also that p(A, B) has 
eigenvalues p(a,, Bi), -s P(Qn, Bi,). (©) Finally, show that the weaker 
assumption of simultaneous upper triangularizability is sufficient for the 
previous conclusions; commutativity is not necessary. 


2. If Ae M,, show that the rank of A is not less than the number of 
nonzero eigenvalues of A. Hint: Show that the rank of an upper trian- 
gular matrix is at least as great as the number of nonzero main diagonal 
entries, and then use Schur’s theorem (2.3.1). Use the example 


Az io 1 | 
0 0 
to explain why the rank of A could be greater than the number of non- 
zero eigenvalues. 


3. The purpose of this problem is to show that the Cayley-Hamilton 
theorem holds for matrices whose entries come from any commutative 
ring, not just from the complex field. A commutative ring is a mathe- 
matical structure in which all the axioms of a field are satisfied except for 
the existence of multiplicative inverses. Thus, there are commutative 
operations of “addition” and “multiplication” that obey the usual prop- 
erties of associativity and distributivity. We also explicitly assume that 
there is a multiplicative unit in the ring; that is, there is an element “1” 
such that la =a for all a in the ring. One example of a ring that is not 
necessarily a field is Z; =the integers modulo k. In Z,, “addition” and 
“multiplication” are done as usual, but the result is taken modulo k; Z, is 
a field if and only if k is a prime. Another example is the set of poly- 
nomials in k formal indeterminants with complex coefficients. 


96 Unitary equivalence and normal matrices 


(a) Recall that if Ae M,, then adj A e M, is the unique matrix whose 
i,j entry is the j,i cofactor of A (0.8.2). Show that the fundamental 
identity 

A(adj A) = (adj A)A = (det AMH 


is just an expression of Laplace’s expansion of the determinant of A by 
cofactors and the fact that det A=0 if any two rows or columns of A are 
equal. Observe that this formula involves multiplications and additions 
only, not division. Argue that it is valid for matrices whose entries come 
from any commutative ring. 

(b) Use (a) to show that 


(tI — A) [adj(t] — A) ] = [adj(t7-A)] (t7 —A) =det(t ~A) = paH 


for any Ae M,, or even for any n-by-n matrix A whose entries come 
from a commutative ring. Show that the matrix adj(t/—A) is a matrix 
whose entries are polynomials in ¢ of degree at most n—1, and hence it 
can be written as 


adj(tl-A)=A,_)t" |+A,_3t" 7 + ++ +Ajt+Ao 
where the coefficients A, are n-by-n matrices whose entries are poly- 
nomial functions of the entries of A. The polynomial p,(f) is the charac- 


teristic polynomial of A. 
(c) Show that 


tk —A* =(tI— AETI HAt? + o +A 7t 4+ A*"!) 
= (t]—A)G,(A, t) 


for k=0,1,2,... if A is an n-by-n matrix whose entries come from a 
commutative ring. Conclude that 


tkt=It*=A*‘+(tI—A)G,(A,t),  k=0,1,2,... 


(d) Let py(t) = ant" +a,—\t"'+ +++ +aıt +a = det(t7—A) be the — 
characteristic polynomial of A (with a, =1) and observe that it is well | 


defined for any n-by-n matrix A whose entries come from a commutative 
ring. Use (c) to show that 


palt)l= ¥ atI= Say lA’+ (1A) GA, 0] 
k= k=0 


= p4(A)+(tI—A)G(A, ¢) 


where 


G(A,t)= X ay GA, t) 
k=0 


2.4 Some implications of Schur’s theorem 97 


is a polynomial of degree at most n —1 in ¢ with matrix coefficients whose 
entries are polynomial functions of the entries of A. Now use (b) to show 
that 
palA) = pa(t)I— (t1- A) G(A, f) 

= (tl ~ A)adj(tl— A) — (tl —A)G(A, t) 

= (t1—A)A(A, t) = Q(t) 
where H(A, t)=B,—\t"-'+B, 20" 7 +--+» +B,t+Bpo and each B; is an 
n-by-n matrix whose entries are polynomial functions of the entries of A 
that do not depend on t. Thus, Q,(f) is a polynomial in ¢ of degree at 


most n with matrix coefficients. 
(e) Now evaluate Q4(4) and conclude that p,(A) =0. 


4. Let Ae M, be a nonsingular matrix. Show that any matrix that com- 
mutes with A also commutes with A~!. Hint: Use (2.4.4); give a direct 
argument as well. 


5. Use (2.3.1) to show that if Ae M, has eigenvalues dj, A2,...,A,, then 


n 
SM=trA*, k=1,2,... 
i=] 
6. Show that for 
1 00 —2 1 2 
A=/|0 2 0 and B=| -1 —2 -1 
0 0 3 1 1 1 


0(a@A+bB) = {a—2b, 2a—2b, 3a+ b} for all scalars a, b e C, but A and B 
are not simultaneously similar to upper triangular matrices. 0(AB) =? 


7. Use the criterion in Problem 6 of Section (2.3) to show that the two 
matrices in Example (2.4.14) cannot be simultaneously upper triangu- 
larized. Apply the same test to the two matrices in Problem 6. 


8. An observation in the spirit of McCoy’s theorem (2.4.15) can some- 
times be useful in showing that two matrices are not unitarily equivalent. 
Let p(t,s) be a polynomial with complex coefficients in two noncom- 
muting variables, and let A, Be M, be unitarily equivalent with A= 
UBU* for some unitary Ue M,„. Show that p(A, A*) = Up(B, B*)U*. 
Conclude that if A and B are unitarily equivalent, then tr p(A, A*)= 
tr p(B, B*) for every complex polynomial p(f,s) in two noncommuting 
variables. How is this related to Theorem (2.2.6)? 


9. Let Ae Mn, Be M,, be given and suppose A and B have no eigenvalues 
in common; that is, 0(A)Mo(B) is empty. Use the Cayley-Hamilton 


98 Unitary equivalence and normal matrices 


theorem (2.4.2) to show that the equation AX—XB=0, XEM,, m 
has only the solution X =0. Deduce from this fact that the equation 
AX —XB=C has a unique solution X € M,, ,, for each given Ce Mp, m- 
Hint: lf AX = XB, show inductively that A‘ = XB“ for all k=1,2,... 
and hence p(A)X = Xp(B) for any polynomial p(t). Choose p(t) to 
be the characteristic polynomial of A to obtain p4(A)X =0=Xp,4(B). 
Since p4(B) =(B—),/)---(B—),/), where M, ..., An are the eigenvalues 
of A, the matrix p,(B) is nonsingular and Xp,(B)=0 has only the 
solution X =0. Existence of a solution to AX—XB=C for any given 
right-hand side follows from uniqueness of the solution to the homoge- 
neous equation and (0.5k and /) applied to the linear transformation 
X > T(X)=AX—XB on M,, ». 


10. Use Problem 9 to give a proof of Theorem (2.4.8) that requires at 
most k—1 reduction steps. Hint: Write A as 


An Ayn Ak 

a=? An : _[4n R 
: . 0 T 
0... 0 Akk 


in which each A; is upper triangular with only \; on the main diagonal 
and R, = [A;2... Aix]. Consider 


I X _ M -X 
slo rp lo 7] 


where X is the same size as R;. Show that 
Ay, 0 
0 T 


provided X is chosen so that A1, X —XT = —R,. Continue with the suc- 
cessive rows and conclude that A is similar to diag (411, A22, ..., Akk). 


STAS = | 


11. Let A, Be M, be given and consider the commutator C = AB — BA. 
Show that tr C =0. Consider A = É o] and B= lo a] and show that a 
commutator need not be nilpotent; that is, some eigenvalues of a com- 
mutator can be nonzero, even though the sum of the eigenvalues must be 
Zero. 


12. Let A,BeM,, let C=AB— BA, and suppose that A commutes 
with C. Show that C must be nilpotent. Comment on the example in 
Problem 11. Hint: Why is there a nonsingular S$ e M, such that SCS “I 
diag(Ci, C22, .--, Ck) = Ci, where each Cy e Mn; is upper triangular, 


2.4 Some implications of Schur’s theorem 99 


m+m+ +n=n, o(Ci) = {dj} for i=1,2,...,k, and 4; 4A; if i=j? 
Let A; = SAS™!, B,=SBS~', and write A, = (A;;) and B, = (B;,) in block 
form conformal with the block diagonal form of C;. Show that A; G = 
C;A, and use Problem 9 to show that A;; =O if k >1and i # j. Then each 
C;; = Aj; Bi;— Bi Ai; has tr C;;=0 and hence \; =0 and k=1. 


13. Using the notation of Problem 9, use (2.4.9) to give another proof 
of the fact that the equation AX —XB=C has a unique solution for 
every Ce M, if A and B have no eigenvalues in common. Hint: Consider 
the linear transformations 7), 7): Mn, m > Mn, m defined by T, (X) =4X, 
T)(X) = XB. Show that T, and T} commute, and deduce from (2.4.9) 
that the eigenvalues of T are differences of eigenvalues of 7; and 7). 
Argue that ) is an eigenvalue of T, if and only if there is a nonzero 
X € My, m such that AX —)\X =0, which can happen if and only if^ isan 
eigenvalue of A [consider the nonzero column(s) of X]. The sets of eigen- 
values of T; and A are therefore the same, and similarly for T, and B. 
Thus, T is nonsingular if A and B have no eigenvalues in common. If x is 
an eigenvector of A corresponding to the eigenvalue ) and y is an eigen- 
vector of B7 corresponding to the eigenvalue p, consider X = xy T show 
that T(X) =(\—,)X, and conclude that the set of eigenvalues of T con- 
sists of all possible differences of eigenvalues of A and B. 


14. Let § ={4;:i€ 5} CM, be a commuting family. Show that F can be 
simultaneously upper-triangularized in such a way that any one given 
member is reduced to the special form in (2.4.8) and the other members 
are reduced to conformal block diagonal upper triangular form. That is, 
for each given Age F, show that there is a nonsingular Se M, such that 
A, = Sdiag(T{”, ..., T{?)S~! for all i€ 9, each T € Mn; is upper trian- 
gular for j=1,2,...,k, m+m+ =- +n=n, and all ie 9, and all the 
main diagonal entries of each T are dj, with Mæ); if jæi. Hint: 
Choose S so that S~!A)S has the special block diagonal upper triangular 
form in (2.4.8). Note that the family {S~'A;S:ie9} is commutative. 
Partition each S~'4,;S conformally to the block form of S ~I ApS and 
employ commutativity and the result of Problem 9 or 13 (as in Problem 
12) to show that all the off-diagonal blocks of each S ~'4,S must vanish. 
Now use (2.3.3) on the k families of correspondingly placed diagonal 
blocks. Except for S ~'"4)S, there is, of course, no guarantee that the 
eigenvalues of one diagonal block of S ~"4,S are all equal or that different 
diagonal blocks have disjoint spectra. 


Further Readings and Notes. Theorem (2.4.15) and its generalizations 
were proved by N. H. McCoy, “On the Characteristic Roots of Matrix 
Polynomials,” Bull. Amer. Math. Soc. 42 (1936), 592-600. See also 


100 Unitary equivalence and normal matrices 


T. S. Motzkin and O. Taussky, “Pairs of Matrices with Property L,” 
Trans. Amer. Math. Soc. 73 (1952), 108-114, where the relation between 
eigenvalues and linear combinations is discussed. A pair A, Be M, such 
that o(aA + bB) = {aa;+ bpi: j=1,..., n} for all a, be C is said to have 
property L, and the condition of (2.4.15) is called property P. Clearly 
property P implies property L and not conversely. The weaker property 
L is not fully understood, although it is known, for example, that a pair 
of normal matrices [see (2.5)] with property L must commute and must 
therefore be simultaneously unitarily diagonalizable. 


2.5 Normal matrices 


The class of normal matrices, which arises naturally in the context of 
unitary equivalence, is important throughout matrix analysis and gen- 
eralizes unitary, real symmetric, and Hermitian matrices. 


2.5.1 Definition. A matrix A €e M, is said to be normal if A*A = AA*, 
that is, if A commutes with its Hermitian adjoint. 


Exercise. Show that A e€ M, is normal if and only if every matrix that is 
unitarily equivalent to A is normal. The class of normal matrices is closed 
under unitary equivalence. 


2.5.2 Examples. 


(a) Since U*U=/=UU* if U is unitary, all unitary matrices are 
normal. 

(b) Since A*A = AA* trivially if A*=A, all Hermitian matrices are 
normal. 

(c) If Ae M, is such that A*=—A, A is called skew-Hermitian. 
In this event, A*A = —A*=AA", so that all skew-Hermitian 
matrices are also normal. 


(d) A= [| =i] is normal, but it does not fall into any of the above 


categories. 


Exercise. Characterize the normal matrices in M>(R)} in terms of rela- 
tions among the entries. Present the answer in terms of the categories 
(2.5.2a, b, and c). Hint: If A e M>(R) is normal, show that either A =A! 
or A = —A' if at least one entry of A is zero. If all entries of A are non- 
zero, show that either A =A!’ or AA’ =al for some a>0. 


2.5 Normal matrices 101 


Exercise. Give an example of a 2-by-2 real matrix that is not normal. 
Give an example of a real 2-by-2 matrix that is normal but is not sym- 
metric, skew-symmetric, or orthogonal. 


Exercise. Show that each of the categories (2.5.2a, b, and c) is itself 
closed under unitary equivalence. 


Exercise. Show that a diagonal Hermitian matrix must have real entries 
and a diagonal skew-Hermitian matrix must have pure imaginary entries. 


2.5.3 Definition. If A e M, is unitarily equivalent to a diagonal matrix, 
A is said to be unitarily diagonalizable, with a similar definition for 
orthogonally diagonalizable. Note that “unitarily (or orthogonally) 
diagonalizable” implies diagonalizable (but not conversely). 


Exercise. Review the proof of (1.3.7) and conclude that Ae M, is uni- 
tarily diagonalizable if and only if there is a set of n orthonormal vectors 
in C”, each of which is an eigenvector of A. 


We next catalog the most fundamental facts about normal matrices. 
The equivalence of (a) and (b) in the following theorem is often called the 
spectral theorem for normal matrices. 


2.5.4 Theorem. If A=[a,;;]€M, has eigenvalues )j,...,A,, the fol- 
lowing statements are equivalent: 


(a) A is normal; 

(b) A is unitarily diagonalizable; 

(c) Elja = Die |Ai|?s and 

(d) There is an orthonormal set of n eigenvectors of A. 


Proof: We suppose throughout that T=[t,;]—€M, is an upper trian- 
gular matrix which is unitarily equivalent to A, as guaranteed by Schur’s 
theorem (2.3.1); that is, T= U*AU for some unitary Ue M,,. Since T is 
unitarily equivalent to A, the statement (a) is equivalent to the normality 
of T. We proceed by showing that (a) is equivalent to (b), (b) is equiva- 
lent to (c), and (b) is equivalent to (d). 

, To show that (a) implies (b), we use a calculation. If A is normal, then 
T is normal. But a triangular normal matrix must be diagonal, as may be 
seen by equating the diagonal entries of 7*7 and TT*. The fact that the 
l,l entry of T*T is the same as that of 77* means that 


102 Unitary equivalence and normal matrices 


n n 
futn=tifut D ajhs=ltul + & [tj 
j=2 j=2 
This means that 0 = }=2 |ti |°, a sum of nonnegative terms, each of 
which must then be 0. We conclude that 


|? 


tj=0, f=2,...50 
The fact that the 2, 2 entries of T*T and TT* are the same then means that 
n n 
htn =tratat & ty jf2j = |t22|° + È lty? 
j=3 j=3 
and we conclude for the same reason as above that 
fa; =0, j=3,...7 
In the same manner, assuming we have verified that 
ti; =9, j>i and i=1,...,k-1 
we may conclude that 
tij=0, j>i for i=k 
Arguing upon each successive diagonal entry in turn, we conclude, 
finally, that 
t;;=0, j>i for i=1,...," 
and, since 
ti; =0, j<i for i=1,...,” 
because T is upper triangular, we have that T is diagonal and (b) holds. 


Since diagonal matrices are clearly normal and unitary equivalence pre- 
serves normality, (b) implies (a) also. 

For the equivalence of (b) and (c), we appeal to (2.2.2). Since the 
diagonal entries of any diagonalization of A are the eigenvalues ħi, -s An 
(in some order), (2.2.2) allows us to deduce (c) from (b). On the other 
hand, since the \;, i=1,...,”, are the diagonal entries of T (in some 


order), (2.2.2) also means that 
n n 
YO JayP = D NH E lty 
j=l i=l i<j 
But (c) means that 
È lty =0 
i<j 


or that T is diagonal, from which (b) follows. 


2.5 Normal matrices 103 


The equivalence of (b) and (d) is the content of the exercise preceding 
this theorem. C 


Exercise. If Te M, is triangular, and if the ith diagonal entry of T*T is 
the same as that of TT*, i=1,...,n, show that T is diagonal, Explain 
why this fact, together with the fact that normality is invariant under 
unitary similarity, is the basic reason why a normal matrix is unitarily 
diagonalizable. 


Exercise. Show that a normal matrix is nondefective (the geometric 
multiplicity is the same as the algebraic multiplicity for each eigenvalue). 


Exercise. Show that if Ae M, is normal, then xe C” is a right eigen- 
vector of A corresponding to the eigenvalue ) of A if and only if xis a left 
eigenvector of A corresponding to ); that is, Ax =x is equivalent to 
x*4 =)x*. Hint: Normalize x and write A= UAU* with x as the first 
column of U. Then what is A*? A*x? See Problem 20 at the end of this 
section for another proof. 


Exercise. If AEM, is normal, and if x and y are eigenvectors corre- 
sponding to distinct eigenvalues, show that x and y are orthogonal. Hint: 
If Ax= dx, Ay=py, show that px*y=x*(Ay)=(A%X)"y= (\x)*y = 
\x*y. If A # p, conclude that x*y =0. See Problem 21 for another proof. 


If the eigenvalues of a normal matrix are known, it can be unitarily 
diagonalized via the following conceptual prescription. Determine each 
eigenspace and find an orthonormal basis for it (using the Gram-Schmidt 
procedure, for example). Since the dimension of each eigenspace is equal 
to the multiplicity of the corresponding eigenvalue, and since A is nor- 
mal, the union of these bases will be an orthonormal basis for the entire 
space. Arraying these vectors as the columns of a unitary matrix produces 
the desired diagonalizing transformation. 

We next note that commuting normal matrices may be simultaneously 
unitarily diagonalized. 


2.5.5 Theorem. If XN SM, is a commuting family of normal matrices, 
then J is simultaneously unitarily diagonalizable; that is, there is a single 
unitary similarity that transforms each matrix in JU into a diagonal 
matrix. 


Exercise. Use (2.3.3) and the fact that a triangular normal matrix must 
be diagonal to prove (2.5.5). Explain how both the hypothesis and con- 
clusion of (2.5.5) are stronger than those of (1.3.19). 


104 Unitary equivalence and normal matrices 


The application of (2.5.4) to the special case of Hermitian matrices 
yields a fundamental result, often called the spectral theorem Jor Her- 
mitian matrices. 


2.5.6 Theorem. If A e M, is Hermitian, then 


(a) All eigenvalues of A are real; and 
(b) Ais unitarily diagonalizable. 


If Ae M,(R) is symmetric, then A is real orthogonally diagonalizable. 


Proof: A diagonal Hermitian matrix must have real diagonal entries, so 
(a) follows from (b) and the fact that the set of Hermitian matrices is 
closed under unitary equivalence. Statement (b) follows from (2.5.4) 
because Hermitian matrices are normal. If A e M,,(R) is symmetric, then 
it is Hermitian, but notice that all calculations necessary to diagonalize A 
take place over the real field. Since the eigenvalues of A are real, the 
corresponding eigenvectors may be taken to be real. [ 


It is important to realize, in contrast to the discussion of diagonaliz- 
ability in Chapter 1, that distinctness of eigenvalues or the like is no 
longer important in (2.5.4) and (2.5.6), and diagonalizability need not be 
assumed in (2.5.5). A full linearly independent set of eigenvectors (in fact 
an orthonormal set) is structurally guaranteed. This is one reason why 
Hermitian and normal matrices are so important and have such pleasant 
properties. 

We conclude with analogs of (2.5.4) and (2.5.5) for real normal ma- 
trices. Such matrices are normal and therefore can be diagonalized by a 
not necessarily real unitary similarity, but what is the best form such a 
matrix can be put into by a real orthogonal similarity? Since a real 
normal matrix might have no real eigenvalues whatever, there is no guar- 
antee it can be diagonalized with a real similarity. On the other hand, any 
real matrix can be put into a special block upper triangular form by a real 
orthogonal similarity (2.3.4), and this suggests what to do if the matrix is 
also normal. Our proof uses (2.3.4) in the same way that (2.3.1) was used 
in the proof of (2.5.4). The following lemma disposes of a slight tech- 
nical complication that was not present in the proof of (2.5.4). 


2.5.7 Lemma. If A € M, is Hermitian and x*Ax > 0 for all xe C”, then 
all the eigenvalues of A are nonnegative. If, in addition, tr A =0, then 
A=0. 


Proof: Use (2.5.6) to write A = UAU*, where U = [u] U2 -+- Up] E Mp is 
unitary and A =diag(\;, \2,...,4,). Then A =U*AU, so M= uz Au, = 0 


2.5 Normal matrices 105 


by hypothesis, and hence all \, = 0. Finally, tr A =tr UAU*=trAU*U= 
trA=A,+---+A,, so if trA =0 and all \, =0, we must have all \, =0, 
A=0, and A= UAU*=U0U*=0. O 


2.5.8 Theorem. Let A e M,(R). Then A is normal if and only if there 
is a real orthogonal matrix Q e M,(R) such that 


A, 0 

QAQ = A eM,(R), 1Iskzsn (2.5.9) 
0 A 

where each A; is either a real 1-by-1 matrix or is a real 2-by-2 matrix of 

the form 


Aj= | (2.5.10) 


Proof: A direct calculation shows that every matrix of the form (2.5.10) 
is normal [A,A} = diag(a?+8},07+6))=A‘A)], so any direct sum of 
the form (2.5.9) must also be normal. For the forward implication, we 
invoke (2.3.4) to show that it suffices to prove the theorem for a real 
normal matrix of the form (2.3.5). Since the main diagonal blocks in 
(2.3.5) may be arranged in any prescribed order, we may assume that 


[R Ao: Aor -» Aok 
Ay Aj? oes Aik 
A= Az eee Ap, E€ M,(R) (2.5.11) 
0 Ka 
Akk 


(o and 


is normal with 


* 
R= 0” , e M,(R) 
p 


upper triangular, Ag), A02, ..., Aox E€ Mp 2 (R), and A,, e M2(R) for i, j = 
1,2,...,k and j =i. We shall show that R is diagonal and that A,, =0 


for all j>i. l l 
Equating the first p-by-p main diagonal block entries in the identity 
A‘TA=AA™ corresponding to the block R in (2.5.11) gives the identity 


R™R= RR! +Ap Abi +--+ + Age Abu (2.5.12) 


106 Unitary equivalence and normal matrices 


Observe that every matrix Be M,(C) of the form B = EE* for some 
EeM,,, is Hermitian and has the property that x*Bx=x*EE*x = 
(E*x)*(E*x) = 0 for all xe C”, and a sum of such matrices has the same 
property. Since 


tr RR =tr RR" 
on general principles, and 

tr RIR =tr RR! +tr Ao Ady +- +t Aon Abx 
from (2.5.12), we see that 

0 =tr Ag, Ady + + +tr Agog Abs 


By Lemma (2.5.7) and the above observation applied to the real matrix 
B=AojA0; = Ay; Adj, we have tr AojAbj >0. Since the sum is zero, each 
term is zero, and hence Ag;A6; =0 for j =1,...,. The ith main diagonal 
entry of Ao jb; is the sum of the squares of the (real) elements in the /th 
row of Ao;, so all these elements must be zero, all Ap; =0, j =1,2,...,%, 
and (2.5.12) reduces to 


R’R=RR' 
But we have already shown in the proof of (2.5.4) that a triangular 
normal matrix must be diagonal, so we have R=diag(hj,..-,Ap), as ; 
asserted. 


Equating the main diagonal 2-by-2 block entries in the identity AA = 


AA! corresponding to the block Aj, in (2.5.11) and using the fact that all — 


Aoj =0 for j =1,2,...,k gives the identity 


Al Ay =AnAntApAbt- +AA (2.5.13) 


But tr(A Ai) =tr(4n41), So 
tr(A,.Aly) +--+ +tr(Ay, Aix) =0, tr(A,;Ai;) 20 
tr(AyjAlj)=0,  AyAij=0, and A\;=0 


for j =2,3,...,, using Lemma (2.5.7) in the same way as before. Thus, 
(2.5.13) reduces to AT, Ay) = A114; that is, the 2-by-2 block Aj, is normal. 

Examining each successive main diagonal 2-by-2 block entry of the 
identity ATA = AA’ corresponding to the block A; in (2.5.11) for i= 
2,3,...,k—1, and employing this same argument, we conclude that all 
the off-diagonal blocks are zero as asserted, and all the diagonal blocks 
Aj; are normal. 

We have shown that a real orthogonal similarity that reduces a real 
normal matrix to the form (2.3.5) actually reduces it to the block diagonal 


2.5 Normal matrices 107 
form (2.5.9). We complete the proof by showing that all the diagonal 
blocks have the form (2.5.10). 

If Ajj = [e 2] €e M>(R) is normal, then equating the 1, 1 and 1, 2 entries 
of the identity Aj;Aj; =A, Al; gives the identities 

b?=c’, so c=+b 
and 

ac+bd=ab+cd, so 2b(a—-d)=0 if c=—b 


The cases c= +b and b =0 can be excluded since A;; would then be real 
symmetric and would have only real eigenvalues; by our construction, 
the blocks A;; have conjugate pairs of nonreal eigenvalues. Thus, 
c=—b, a=d, and A;; must have the form (2.5.10). A calculation shows 
that a real matrix E | has a pair of conjugate complex eigenvalues 
\=a+iband \=a-ib. O 

As a consequence of this theorem for real normal matrices, we can 
easily deduce the real canonical forms for real matrices that are sym- 
metric, skew-symmetric, or orthogonal. 


2.5.14 Corollary. Let A €e M,(R). Then 


(a) A = A’ if and only if there is a real orthogonal matrix QeM,(R) 
such that 
Al 0 
Q'AQ = `, with all A;eR 
0 dp 


(b) A= —A’ if and only if there is a real orthogonal matrix Qe 
M,,(R) such that 
0 
0 
TA = 
QAQ A 


0 A; 


where each A; € M2(R) has the form 


|0 8B 
A=| s o | 


108 Unitary equivalence and normal matrices 


(c) AA =] if and only if there is a real orthogonal matrix Qe 
M,,(R) such that 


Ai 


Q7AQ = A 


0 A, 


where each A; = +1 and each A; € M2(R) has the form 
6; sin 0; 
= cos 0; sin 0; eR 
—sin 0; cos 6; 
Proof: In each case, the hypothesis guarantees that A is a real normal 
matrix, so A can be written in the form (2.5.9) and (2.5.10). If A = A’, 
then each A; = A}, so all 8; =0 and QAQ is diagonal. If A = —A’, then 
each A; = —\, and each A; = —A7, so all \; =Oand all a; =0. If AA’ =], 
then each \;A;=1 and each A,Aj =I], so all N=] and all ap +B7=1; 
we have à; = +1 and a; =cos 6;, B; = sin 6; in this case. C 


If one has a family of commuting real normal matrices, they might not 
be simultaneously real diagonalizable, but they can all be brought simul- 
taneously into block diagonal form (2.5.9). 


2.5.15 Theorem. If N €S M,(R) is a commuting family of real normal 
matrices, then there is a single real orthogonal matrix Q such that Q7AQ 
is of the form (2.5.9) and (2.5.10) for all Ae OU. 


Proof: Use (2.3.6) to reduce every member of N to the form (2.3.5) via 
one real orthogonal similarity Q. The argument in the proof of (2.5.8) 
shows that they then have the form (2.5.9. ( 


Problems 


It is possible to accumulate a much longer list than (2.5.4) of conditions 
on Ae M, that are equivalent to normality. Several more are included 
among the problems. 


1. Show that Ae M, is normal if and only if the Euclidean length of 
Ax is the same as that of A*x for all xe C”. Recall that (y*y)! is the 
Euclidean length of ye C”. 


2.5 Normal matrices 109 


2. Show that a normal matrix is unitary if and only if all its eigenvalues 
have absolute value 1. 


3. Show that a normal matrix is Hermitian if and only if all its eigen- 
values are real. 


4. Show that a normal matrix is skew-Hermitian if and only if all its 
eigenvalues are pure imaginary (have real part equal to 0). 


5. If Ae M, is skew-Hermitian (respectively Hermitian), show that iA is 
Hermitian (respectively skew-Hermitian). 


6. Show that Ae M, is normal if and only if it commutes with some 
normal matrix with distinct eigenvalues. 


7. Consider matrices A € M, of the form A= B~'B* for a nonsingular 
BeM,, as in Theorem (2.1.9). (a) Show that A is unitary if and only if B 
is normal. (b) If B has the form B= HNH, where N is normal and H is 
Hermitian (and both are nonsingular), show that A is similar to a unitary 
matrix. 


8. Define H(A) =4(A+.A*), the Hermitian part, and S(A) =4(A—A*), 
the skew-Hermitian part, of Ae M,. Then A = H(A) + S(A). Show that 
A is normal if and only if H(A) and S(A) commute. 


9. If two normal matrices commute, show that their product is normal, 
but show by example that the product of two normal matrices can be 
normal even if the two factors do not commute. 


10. In the notation of Problem 8, show that A is normal if every eigen- 
vector of H(A) is an eigenvector of S(A) (respectively A). 


11. For any complex number z € C, show that there is some 8 e R such 
that Z=e'%z. Notice that [e’’] e M; is a unitary matrix. What do diagonal 
unitary matrices Ue M, look like? 


12. Generalize Problem 11 to show that if A = diag(\,, \2,..-, An) E My, 
then there is a diagonal unitary matrix U such that Å = UA = AU. 


13. Use Problem 12 to show that a matrix A e M, is normal if and only 
if there is a unitary matrix Ve M,, such that A*= AV. How is this related 
to Problem 7? 


14. If Ae M,,(R) and if all the eigenvalues of A are real, show that A is 
normal if and only if it is symmetric. 


15. Show that two normal matrices of the same size are similar (in fact, 
are unitarily equivalent) if and only if they have the same characteristic 


110 Unitary equivalence and normal matrices 


polynomial. Is this true for matrices that are not normal? Hint; Consider 
0 0 01 

lo o] and lo al 

16. If A, Be M, are normal, show that AB need not be normal and that 

the nonsingular normal matrices of a given size do not form a multiplica- 


tive group. The unitary normal matrices do form a group, however. Do 
the nonsingular Hermitian matrices form a multiplicative group? 


17. If Ae M, is normal and p(f) is a given polynomial, use Definition 
(2.5.1) to show that p(A) is normal. Give another proof of this fact using 
(2.5.4). 


18. If Ae M, has the property that p(A) is normal for some nonzero 
polynomial p(t), is A normal? Hint: Consider A = [5 a and A’. 


19. Let Ae M, and a e C be given. Show that A is normal if and only if 
A-+al is normal. 


20. Let Ae M, be a normal matrix and suppose x € C” is a vector such 
that Ax =x. Use Problems 1 and 19 to show that A*x =x. Hint: If the 
Euclidean length of the vector (A —\J)x is zero, argue that the Euclidean 
length of (A—X/)*x is also zero. 


21. Use (2.5.4) to show that if AeM, is normal and if Ax =x and 
Ay = py with \ # p, then x and y are orthogonal. Hint: Write A = UAU* 
with Ue M, unitary and A=diag(\j,...,An,). Let U*x =x'=[x/] and 
U*ty = y’=[)/]. Show that Ax’=)x’ and deduce from this that x/ = 0 for 
every index i such that \; # A, and similarly for y’. Show that x’ and y’ 
are orthogonal and conclude that x and y are orthogonal. 


22. Use (2.5.6) to show that the characteristic polynomial of a Hermitian 
matrix A must have real coefficients even if not all of the entries of A are 
real. 


23. Show that [} ‘| and |; A are both complex symmetric matrices 
(A = A’), but one is normal and the other is not. There is, therefore, a 
crucial difference between real symmetric matrices and complex sym- 


metric matrices [see Section (4.4)]. 
24. If Ae M, is both normal and nilpotent, show that A = 0, 


25. Let Ae M, be a given matrix. Show that A is normal if and only if 
there is a polynomial p(t) of degree at most n—1 such that A*= p(A). 
Hint: If A=diag(\;,.-.,,), use Lagrange interpolation to construct a 
polynomial p(t) such that p(A) =A and then invoke (2.5.4). How does 
this “explain” why a normal matrix commutes with its adjoint? If, in 


2.5 Normal matrices 111 


addition, A is real, show that the Lagrange interpolation polynomial 
p(+) such that A* = p(A) has real coefficients and A’ = p(A). Thus, a real 
normal matrix A has A’ = p(A) for a real polynomial p(+). See (0.9.11.4). 


26. Give an example of a real normal matrix that is unitarily similar to a 
diagonal matrix but is not real orthogonally similar to a diagonal matrix. 
Show that a real matrix A is real orthogonally similar to a diagonal matrix 
if and only if A is symmetric (A =A’), 


27. Show that a given matrix A € M, is normal if and only if 
(Ax)*(Ay) = (A*x)*(A*Y) 


for all x, ye C”. Geometrically, this means that the angle between Ax 
and Ay is the same as the angle between A*x and A*y for all x, ye C”. 
How is this related to Problem 1? 


28. If AEM, is normal, show that Ax =0 if and only if A*x=0. This 
means that the null space of A is the same as the null space of A’. Con- 


sider lo 5] and i o] to show that this is not true in general. 


29. Consider the system of linear equations Ax = y, where Ae M, and 
yeC" are given, and suppose A is singular. The given system has a (non- 
unique) solution if and only if y*z =0 for every z € C” such that A*z =0 
[see (0.6.6)]. If A is normal, however, show that the given system has a 
solution if and only if y*w =0 for every we C” such that Aw = 0; that is, 
y is orthogonal to the null space of A. If one wants to find a/l solutions to 
a singular system Ax = y, explain why it is computationally more eco- 


- nomial to do so if A is normal than if it is not. 


30. Let m,72,...,7% be given positive integers and let AjeMn,, J= 
1,2,...,k. Show that the direct sum A =A)@--- @Ax is normal if and 
only if each A; is normal. 


31. Show that two normal matrices are similar if and only if they are 
unitarily equivalent. Hint: Show that UAU* and VAV* are unitarily 
equivalent if U and V are unitary. Give an example of two (nonnormal) 
matrices that are similar but not unitarily equivalent. 


32. If Ae M;(R) is a real orthogonal matrix, observe that A has either 
one or three real eigenvalues. If it has a positive determinant, use (2.5.14) 
to show that it is orthogonally equivalent to the direct sum of [1] € M, and 
a plane rotation. Discuss the geometrical interpretation of this as a rota- 
tion by an angle 6 around some fixed axis passing through the origin in 
R°. This is part of Euler’s theorem in mechanics: Every motion of a rigid 
body is the composition of a translation and a rotation about some axis. 


112 Unitary equivalence and normal matrices 


33. If FEM, is a commuting family of normal matrices, show that 
there exists a single Hermitian matrix B such that for each A, € F there is 
a polynomial p,(t) of degree at most n—1 such that A, = p(B). Notice 
that B is fixed for all of ¥ but the polynomial may depend on the element 
of F. Hint: Let Ue M, be a unitary matrix that simultaneously diago- 
nalizes every member of F, let B= Udiag(1, 2,...,n)U*, let A, =UA,U" 
with A, = diaga”, se ao , and take p,,(t) to be the Lagrange interpo- 
lation polynomial such that pa(k) = A k=1,2,...50. 


34. Show that Ae M, is normal if and only if every eigenvector of A is 
also an eigenvector of A*. Hint: Let Ue M, be a unitary matrix whose 
first column is an eigenvector of A (and, therefore, of A*). Inspect both 
U*AU and U*A*U = (U*AU)* and continue. 


35. Verify the following improvement of (2.2.8) in case A, Be M, are 
normal: A is unitarily equivalent to B if and only if tr A‘ =trB*, k= 
1,2,...,”. Hint: Use Problem 15, and Problem 12 in Section (1.2). 


36. Let Ae M,(R) and suppose AA’=<A7A, so A is a real normal 
matrix. If the eigenvalues of AA! are all distinct, show that A must be 
symmetric. Hint: Use Theorem (2.5.8). 


2.6 OR factorization and algorithm 


A particular means for calculating a specific unitary Schur upper trian- 
gularization (2.3.1) of a given matrix A € M,, and a popular numerical 
method for calculating eigenvalues (under some assumptions) is called 
the QR algorithm. It is based on the so-called QR factorization of a 
general matrix A € M,, m- 


2.6.1 Theorem (QR factorization). If A € M,,m and n= m, there is a 
matrix Qe M,, m with orthonormal columns and an upper triangular 
matrix Re M,, such that A= OR. If m=n, Q is unitary; if in addition A 
is nonsingular, then R may be chosen so that all its diagonal entries 
are positive, and in this event, the factors Q and R are both unique. If 
AéM,, »(R), then both Q and R may be taken to be real. 


Proof: if AE Mn, m and rank A =m, the QR factorization of A is just a 
description, in matrix notation, of the result of applying the Gram- 
Schmidt process (0.6.4) to the columns of A, which comprise an inde- 
pendent set in C”. A natural extension of the Gram-Schmidt algorithm 
permits the same description to apply to the general case in which the 
columns of A may be dependent. Let A=[qa,...@,] be written in par- 


2.6 QR factorization and algorithm 113 


titioned form in terms of its columns a;e C”. If a, =0, set qı =0; other- 
wise set qı =a,/(aj'a;)'. For each k =2,3,...,m, compute 
k-1 
Ve= Ap x (474K) qi 
f= 


just as in the ordinary Gram-Schmidt process. If y, =0 (which happens 
if and only if a, is a linear combination of a, @z,...,@%—1), set qk =0; 
otherwise set qx = Yk /(yiy,)'/. The vectors qj, ..., qm are then an orthog- 
onal set, each element of which is either a unit vector or the zero vector. 
Each vector q; is a linear combination of a4, ...,a;, and the construction 
ensures that, conversely, each column a; is a linear combination of 
qi- qj. Thus, scalars rg; exist such that 


j 
a= X Thi Me J=1,2,...,m (2.6.2) 
If we set rą; =0 for all k >j and set r;; =0 for all j =1, 2, ...,m for each 
i such that q; = 0, the upper triangular matrix R = [r;;] € Mm and the 
vectors qi, q2, ---» Qm are determined, via the outlined procedure, by 
Gy, 42, ++, Am. The matrix Q = [q1 ... dm] E My, m has orthogonal columns 
(some of which may be zero), and (2.6.2) says that A = OR. 

If rank A =m, Q has orthonormal columns, and hence a factorization 
of the desired form has been achieved. In particular, if m=n and A is 
nonsingular, then Q must be unitary by (2.1.4e) and the diagonal entries 
of the nonsingular matrix R = Q*A must be nonzero. In this event, because 
Ris required to be upper triangular, qı is a scalar multiple of a,;, and, for 
i=2,3,...,m,-q; lies in the one-dimensional space that is the orthogonal 
complement of the span of a), ..., &;—ı with respect to the span of a4, ..., a). 
Therefore, each q; is uniquely determined up to a factor of scale of abso- 
lute value 1. Thus, replacement of R by R’=diag(|ri|/rit,---s [Pmm|/Pmm)R 
and replacement of Q by Q’=Qdiag(ry/|rit|,---,%mm/|rmm|) gives 
the unique factorization A =Q’R’ promised in the statement of the 
theorem. 

If the columns of A are not independent, take the (orthonormal) set of 
nonzero columns of Q and extend it to an orthonormal basis of C”; 
denote the new vectors obtained in this way by Z4, Z2, ..., Zp. Now replace 
the first zero column of Q by z;, replace the second zero column by z3, 
and so on until all zero columns have been replaced in this way; denote 
the resulting matrix by Q’. Then Q’ has orthonormal columns and QR = 
Q’R because the new columns of Q’ are matched by zero rows of R. Then 
A=(Q’R is a factorization of the desired form. 

If A is real, notice that all the necessary operations may be carried out 
in real arithmetic, so the factors obtained are real. O 


114 Unitary equivalence and normal matrices 


Exercise. If A€ M,, and ns m, show that A may be factored as A= 
LP, in which Le M, is lower triangular and Pe M,, » has orthonormal 
rows, and that there are statements for this factorization parallel to the 
remaining ones in (2.6.1). 


Exercise. Show that any matrix Be M, of the form B=A*A, AEM,, 
may be written as B= LL*, where L € M, is lower triangular with non- 
negative diagonal entries. Show that this factorization is unique if A is 
nonsingular. This is called the Cholesky factorization of B; every posi- 
tive definite matrix may be factored in this way (see Chapter 7). Hint: 
Write A = QR. l 


The OR factorization has considerable numerical significance [e.g., 
(2.6.3)], but it is also of interest as a theoretical tool. For example, the 
upper triangularizability of a given matrix Ae M, by unitary similarity 
follows immediately from its upper triangularizability by ordinary simi- 
larity. If S~!AS=T is upper triangular and S= QR as in (2.6.1), then 
R7'Q*AQR=T and Q*AQ=RTR™', which, as a product of upper 
triangular matrices, is again upper triangular. It follows in the same 
way that simultaneous triangularization theorems such as (2.4.15) are 
actually simultaneous unitary triangularization theorems. That is, if a 
given family of matrices in M, is simultaneously triangularizable by sim- 
ilarity, it is also simultaneously triangularizable by unitary equivalence. 

We next state the OR algorithm for eigenvalue calculation and briefly 
indicate some of its features without proof. 


2.6.3 OR algorithm. Let AoE M, be given. Write Ag = QoRo, where 
Qo and Ro are as guaranteed in (2.6.1), and define A; = RoQo. Again, 
write A; = Qı Ri, with Q; unitary and R, upper triangular, and continue. 
In general, factor Ax = Q;R, and define Ay +1 = Ry Qk. 


Exercise. Show that each A, produced by the QR algorithm is unitarily 
equivalent to Ag, k=1,2,.... 


Under certain circumstances (for example, if all the eigenvalues of Ao 
have distinct absolute values), the OR iterates A, will converge to an 
upper triangular matrix as k > 09. Since this upper triangular matrix is 
unitarily equivalent to Ao, the eigenvalues of Ao are revealed. 

If Ao is real, then the QR iterates A, may be calculated using real 
arithmetic. If Ag has any nonreal eigenvalues, there is no hope that the 
OR iterates will converge to an upper triangular matrix, since this upper 


triangular limit must be real. Under certain circumstances, however, the | 


ena 


2.6 QR factorization and algorithm 115 


iterates A, may be chosen so that they converge to a real block upper tri- 
angular matrix with 1-by-1 and 2-by-2 main diagonal blocks. A sufficient 
condition for this to occur is that all the eigenvalues of Ay have distinct 
moduli, except for the two eigenvalues in each non-real complex conju- 
gate pair, which have the same modulus. Since the eigenvalues of a block 
triangular matrix are the union of the sets of eigenvalues of the diag- 
onal blocks, the eigenvalues of Ag are revealed as the 1-by-1 diagonal en- 
tries of the block triangular limit of the OR iterates A,, together with the 
(complex conjugate pairs of) eigenvalues of the 2-by-2 diagonal blocks 
of the limit, which may be calculated easily using real arithmetic and the 
quadratic formula. 


2.6.4 Example. That the QR algorithm need not always converge to a 
triangular matrix is indicated by the following example. Let 


a=] 0 | 
-1 0 


Then, o(A) ={+/} and the eigenvalues do not have distinct moduli. If 
Ao = A, a possible sequence for the algorithm is 


wf IL 
wef UP TE el ILE 2 
ife ie 


> 
l 


Another is 


In either case, cycling occurs and the sequence {A,;} does not converge to 


an upper triangular matrix. There is a choice of {A,} that converges to a 


block upper triangular matrix, however. 


| Problems 


- 1L. Let xı, ..., Xm be given independent vectors in C”, and let X= 


X1 Xz... Xm] E My, m. Suppose that the Gram-Schmidt process as described 
n (0.6.4) is performed on the vectors x,,...,X, to produce an ortho- 
normal set Z;,...,Zm, and let Z = [Z,... Zm] € Mn, m. (a) For k =1, 2,..., m, 
et Zk = [Z1 Zz... ZkXk+1 Xk42+-- Xm], where zę is the unit vector produced 


116 Unitary equivalence and normal matrices 


by the Ath step of the Gram-Schmidt process, so that Zm = Z. Show that 
Z,= XA, Z= Zi Ad, ..., Zm = Zm—1 Am, Where each A; is a nonsingular 
upper triangular matrix of the form A;=/+an upper triangular matrix 
in which all columns but the ith are zero. (b) Let 7; =A) 32... Ax, 
k =1,2,...,m. Observe that T, is upper triangular and Z,=XT;,, k 
1,2,...,m. Let T=T,,, so that Z=XT. (c) What is the relationship 
between this matrix 7 and the upper triangular matrix R in the proof of 
Theorem (2.6.1)? (d) Show that the first k columns of Z; and T; do not 
change for j=k+1,k+2,...,m, so the kth step of the Gram-Schmidt 
process produces the kth columns of the final matrices Z and T. 


2. Let x,,...,X,, be given independent vectors in C” and let X 
[x]... Xm] E Mn, m. Consider the following algorithm: 
I. Set Z=X, and denote Z=[z,...Z,], so that initially each 
column z;=x;, /=1,...,m. 
Il. For k=1,2,...,m, do the following: 
(i) First replace the column Zp by Zk /<Zk, z,)/; then 
(ii) For j=k+1,k+2,..., mreplace each column z; by 
Zip LZ Z Zk 

We use (x, yy= y*x to denote the usual inner product on C”. (a) Show | 
that the final result of this process is a matrix Z with orthonormal col | 
umns and that it is the same matrix Z produced by the Gram-Schmid 
process in Problem 1. (b) If Z, denotes the contents of the matrix Z at. 
the end of the kth step of the algorithm, k =1, 2, ..., m, show that Z, = 
XA, Z2 = Z1 Ad, «+5 Zm = Zm— Âm, where each A; is a nonsingular uppe 
triangular matrix of the form A;=/+an upper triangular matrix i 
which all rows but the ith are zero. (c) Let T; = A, Az... Ag, K=1,2,...,m 
Observe that 7; is upper triangular, and Zk = XT, k=1,...,m. Show] 
that the first k columns of each T, are the same as the first k columns of 
the matrix T; in Problem 1, though the respective matrices A, and Z; 
may be different. Let T= T. (d) Show that the first k columns of Z; and | 
T; do not change for j=k+1,k+2,...,m, so the kth step of the algo-{ 


rithm produces the kth column of the final matrices Z and T. This algo- | ~ 


rithm is known as the modified Gram-Schmidt process; it produces th 
same result as the ordinary Gram-Schmidt process through a rearrange- 1 
ment of the calculation. Although the modified and ordinary Gram 
Schmidt algorithms are mathematically equivalent, the modified Gram 
Schmidt is preferred for numerical computations because it requires les 
storage and, in difficult problems in which some columns of X are near! 
parallel, it tends to produce a computed Z whose columns are mor 
nearly orthogonal than the Z produced by the ordinary Gram-Schmid 
algorithm. Its performance can easily be improved further in difficul 
problems by a column-pivoting strategy: Before performing step H(i) 


2.6 QR factorization and algorithm 117 


first choose as zę a remaining column z,, j =k, whose squared length 
zjz; is greatest. In numerical computations one actually obtains Ay! at 
each step (no inversion is required) and accumulates the products of 
these factors to compute the triangular factor in the OR decomposition 
of X. 


3. Show how the QR factorization may be achieved using a sequence of 
multiplications by Householder transformations. Show that n — 1 House- 
holder transformations are necessary and that Q is a product of these. 
This method is known to be computationally superior to the Gram- 
Schmidt argument used in the proof of (2.6.1). 


4. If the QR algorithm applied to Ay € Mm converges to an upper trian- 
gular matrix, how may eigenvectors of Ag be calculated? Hint: One 
needs to solve a (singular) triangular system with zero right-hand side. 


5. Let the OR algorithm be applied to a given matrix A €e M,, and let 
{Ax} be the sequence of QR iterates. If the sequence converges, and if 
lim, 0A; =B, use the selection principle (2.1.8) to explain carefully 
why B is unitarily equivalent to A. Why is this important? 


Further Readings. For additional references and a detailed description of 


_ efficient computational implementations of the Gram-Schmidt, modified 


Gram-Schmidt, and several other orthogonalization procedures, see pp. 
146-169 of [GVI]. For further discussion of the OR algorithm, proofs, 
and additional references, see the survey by D. Watkins, “Understanding 
the OR Algorithm,” SIAM Rev. 24 (1982), 427-440 as well as [Ste]. 


CHAPTER 3 


Canonical forms 


3.0 Introduction 


When are two given matrices similar? We know that similar matrices have 
the same trace, determinant, characteristic polynomial, and eigenvalues, 
but the example 


0 1 0 0 
A=|) o and B=l 5 o (3.0.1) 


shows that two matrices can have all four of these quantities the same 
without being similar. If there were some nonsingular Se M3 such that 
A=SBS~!=SOS~'!=0, then we would have a contradiction, since A #0. 


Exercise. Compute the trace, determinant, characteristic polynomial, 
-and eigenvalues of the two matrices in (3.0.1). Show that A? =0. 


Since two matrices that look very different can still be similar, one 
approach to determining whether two given matrices are similar is to 
have some set of “simple” matrices of prescribed form and then see if 
both given matrices can be reduced by similarity to one of these “simple” 
forms. If they can, then they must be similar (because the similarity 
relation is transitive and symmetric). What “simple” forms would be 
suitable for this purpose? 

Every complex matrix A is (unitarily) similar to an upper triangular 
matrix whose diagonal entries (the eigenvalues of A) may be arranged in 
any given order (2.3.1), so two matrices are similar if they are similar 
to the same upper triangular matrix. However, two upper triangular 


119 


120 Canonical forms 


matrices with the same main diagonal entries and different off-diagonal 
entries can still be similar. Thus, if we have succeeded in reducing the two 
given matrices to two unequal upper triangular matrices with the same 
main diagonal, we cannot conclude that the matrices are not similar. 
There is too much freedom here; the n(n +1)/2 nonzero entries (or, more 
precisely, “not necessarily zero” entries) in an upper triangular matrix are 
too numerous to distinguish similarity. There is no uniqueness about the 
triangular form. 

If the class of upper triangular matrices is too large for our purposes, 
what about the class of diagonal matrices? If each of two given matrices 
is similar to a diagonal matrix, then they are indeed similar to each other 
if and only if the two diagonal matrices have the same main diagona 
entries, counting multiplicities but ignoring order. The reason is that one 
can use a permutation similarity PDP’ to present the main diagonal 
entries of a diagonal matrix D in any prescribed order. Although this 


solves the problem of uniqueness that we had with the upper triangular | 


matrices, we now have an existence problem: Not every complex matrix 
is similar to a diagonal matrix. 


Exercise. Show that the matrix A in (3.0.1) is not diagonalizable. Hint: If 


A=SAS™', then A =B. 


If we search for an upper triangular form that is as nearly diagonal as 
possible but is still attainable by similarity for every matrix, the result is 
the Jordan canonical form, which we discuss in the next section. 

We have been considering similarity of two given matrices A, Be Mn, 
but there are other equivalence relations of interest in matrix theory. For 
example, we might be interested in whether A can be transformed into B 
by a unitary similarity, or by applying only elementary row and colum 


operations. If A and B are real, we might want to know if A is similar to. 
B via areal similarity. If A and B are Hermitian, we might want to know ] 


if there is a nonsingular Se M, such that A = SBS*. If A and B are sym- 
metric, we might want to know if there is a nonsingular Se M, such that 
A= SBS". 

In each of these examples, we have an equivalence relation on a set of 
matrices and we are interested in whether two given matrices are in the 
same equivalence class. One approach to solving this problem is to seek a 


“simple” set of representative matrices of prescribed form, one from each! 


equivalence class, and to try to reduce each given matrix to one of them. 


If this approach is to be successful at all, each equivalence class must: 
actually contain a representative of the prescribed form (this fails for 
diagonal matrices under similarity), and it is very desirable to have only’ 


one representative (or perhaps a small and readily described set of equiv- 


3.1 The Jordan canonical form: a proof 121 


alent representatives) in each class (this fails for upper triangular matrices 
under similarity). Such a set of representatives is often called a canonical 
form, and we consider several examples in this chapter. Others will arise 
in particular contexts in later chapters. 


3.1 The Jordan canonical form: a proof 


The Jordan canonical form is a set of “almost diagonal” matrices, called 
Jordan matrices, which includes the diagonal matrices. It has the property 
that every equivalence class (under similarity) of square complex matrices 
contains a Jordan matrix, and any two Jordan matrices in the same equiv- 
alence class are essentially the same in a trivial way. A Jordan matrix that 
is similar to a given matrix is called the Jordan canonical form (or some- 
times the Jordan normal form) of the matrix. Once one knows the 
Jordan canonical form of a matrix, all the linear algebraic information 
about the given matrix (i.e., linear transformation) is known at a glance. 


3.1.1 Definition. A Jordan block J,(\) is a k-by-k upper triangular 


matrix of the form 


IA) = (3.1.2) 


I 
L 0 A 


There are k—1 terms “+1” in the superdiagonal; the scalar \ appears k 
times on the main diagonal. All other entries are zero, and J,;(\) =[\]. A 
Jordan matrix Je M, is a direct sum of Jordan blocks 


mA 
Jn, (À2) 0 


0 


(3.1.3) 


K which the orders n; may not be distinct and the values \, need not be 
; distinct. 


Notice that if each Jordan block J,,(X,) in (3.1.3) is one-dimensional, 


-that is, all n;= 1 and k =n, then the Jordan matrix J is diagonal. If any 
Jordan block J,,(X) in (3.1.3) has m>1, then J is not only not diagonal 
itis not even diagonalizable. If J,,(\)=SAS~! with A diagonal, then 
necessarily A=diag(h,\,...,\)=AJ. Thus, J,,(\)-A\7=SAS7!'-)\J= 


122 Canonical forms 


\J—)dJ=0, which is not the case if m>1. There is one eigenvector of J 
associated with each separate Jordan block; it is the standard basis vector 
associated with the first diagonal entry of each J,,(\) in J. l 

The main result of this section is that every complex matrix is similar 
to an essentially unique Jordan matrix. We shall proceed to this final con- 
clusion in three steps: 


Step 1. Observe that every complex matrix is similar to an upper 
triangular matrix whose eigenvalues appear on the main diag- 
onal in a prescribed order; this is the Schur triangularization 
theorem (2.3.1). 

Step 2. Then show that an upper triangular matrix can be trans- 
formed by similarity into a block diagonal matrix in which 
each individual diagonal block has all its diagonal entries 
equal [like the Jordan block (3.1.2)]. This is Theorem (2.4.8). 

Step 3. Finally, show that an upper triangular matrix whose 
main diagonal entries are all equal is similar to a direct sum of 
Jordan blocks (3.1.2). 


Once we have proved the last assertion, the reduction of an arbitrary 
complex matrix to Jordan form follows by combining the similarity 
transformations required for each step. 

We shall also be interested in concluding that if a matrix is real and has 
only real eigenvalues, then the reduction to Jordan canonical form can 
be accomplished with a real similarity. Toward this end, recall (2.3.1 
that if a real matrix A has only real eigenvalues, then there is a real uni- 
tary (real orthogonal) matrix U such that U' AU is upper triangular and 
hence it has only real entries. Moreover, the proof of (2.4.8) shows that if 
an upper triangular matrix A is real, then ‘there is a real similarity matrix 
S such that S~!AS is a (real) block diagonal matrix in which each diag- 
onal block is upper triangular with all its main diagonal entries equal. 
Thus, it suffices to show that step 3 can be accomplished and that, if we 
start with a real upper triangular matrix with all main diagonal entries 
real, then the similarity matrix that reduces it to a direct sum of Jordan $ 
blocks may be taken to be real. 

The following lemma is helpful in proving that step 3 can be accom 
plished. Its proof is an entirely straightforward computation. 


ene 


Lemma. Let k z1 be given, and consider the Jordan block 


0 1 0 
Jy(0) = 7 


3.1.4 


.. 1 
0 0 


3.1 The Jordan canonical form: a proof 123 


Then 
m 0 
Ji (0) J,(0) = and J,(0)?=0 if p2k 
0 Ik- 
Moreover, 
J(Oje;=e; for i=1,2,...,k—1 and [J—JJ(0) 4 (0)]x= (xee 


Here, Iı € Mg- is the identity matrix, e; is the ¿ith standard unit basis 
vector, and xe C”, 

We now prove that the reduction in step 3 can always be accom- 
plished. Recall that a strictly upper triangular matrix is a triangular 
matrix with zero entries on the main diagonal. Notice that an upper 
triangular matrix with equal main diagonal entries is a scalar multiple of 
the identity plus a strictly upper triangular matrix. 


3.1.5 Theorem. Let Ae M, be strictly upper triangular. There is a 
nonsingular Se M, and integers m, M, ...,Mm With nj 2 nz 2 +++ Ny Zl 
and m+n2+ +--+, =n such that 
Jn, (0) 0 
Jn (0 
A=S m(0) Ss"! (3.1.6) 
0 
Inn (9) 


If A is real, the matrix S may be chosen to be real. 


Proof: If n=1, A=[0] and the result is trivial. We proceed by induc- 
tion on n and assume that n>1 and that the result has been proved 
for all strictly upper triangular matrices of order less than n. Partition 


A as 
Az 0 a’ 
0 A; 


where aeC"!, and Ae M,- is strictly upper triangular. By the induc- 
tion hypothesis, there is a nonsingular matrix $; e M,,_, such that Sy AS) 
has the desired form (3.1.6); that is, 


f 0 Jk 1 0 
S;'A,S, = we ~ 0 
0 Ik J 


(3.1.7) 


$ 


with kiz k= = = k,=1, kı+kə+ + +ks=n—1, Je, = Jk,(0), and 


124 Canonical forms 
Jk, 0 
J = - € Mn~k,-1 
0 I 


Notice that no diagonal Jordan block in J has order greater than k, so 
JE =0 by Lemma (3.1.4). A simple computation shows that 


1 0 | ,f1 0]_f° a's, | 
lo sa 0 S 0 S'AS 
Partition aS; = [af a} | consistent with the partition of the far right side 
of (3.1.7); that is, & € Cand ae c"~'-*\, and write (3.1.8) as 


(3.1.8) 


i 0 0 af ad 
fo sallo s58 4 2 
0 Si 1 0 0 J 


Now consider the following similarity of this matrix: 


1 aJi o]{0 af aff asi 0 
0 1 ollo J4 ollo Z 0 
0 o rllo o sijo 0 7 
0 afU—-JE Jn) a? 0 (afe)el al 
-|0 Jr ol=|0 Jq 0| 68.1.9 
0 0 J 0 0 J 


where we have used the identity (J — JET )x = (xee from Lemma 
(3.1.4). There are now two possibilities, depending on whether a; e = 0 


or not. 
If ae, #0, then 


aie, 0 0 0 (aleei aj | [arean 0 0 
0 I 0 0 Jk, 0 0 7 0 
0 0 (1/aje)I 0 0 J 0 0 afeI 

0 e a J eal 
=| 0 Jk, 0 =h; J | 
0 0 J 


Notice that 
~ TO ef 
= = J (0) 
J Mo i. ky +1 


is a Jordan block of order k,+1 with zero main diagonal. Using the 
property that Je;,;=e; for i=1,2,...,k1, one shows easily that 


3.1 The Jordan canonical form: a proof 125 


f ee lj a ee =| an tasi teaa] 


0 I 0 J 0 I 0 J 
[fF ealJ 
“10 J 


and that one can proceed recursively to compute the series of similarities 
I eia] JII] eal JIII epa! Ī e.,aqs' 

f I iF J Ilo I l-h J | 

i=2,3,... 

Since J“! =0, we see that after at most k, steps in this series of similari- 


ties, the off-diagonal term will finally vanish. We conclude that A is sim- 
ilar to the matrix 


[J 0 
0 J 
which is a strictly upper triangular Jordan matrix of the required form. 
If afe, =0, then (3.1.9) shows that A is similar to the matrix 


00 a 
0 i, 0 
00 J 


which is permutation similar to the matrix 


Ji, 0 0 
0 0 af (3.1.10) 
0 0 J 


By the induction hypothesis, there is a nonsingular S€ M,,—x, such that 


0 af , 
—| 2 _ 
S3 f 7 [sjem 
is a Jordan matrix with zero main diagonal. Thus, the matrix (3.1.10), 
and therefore A itself, is similar to 


Jk, 0 
0 J 


which is a Jordan matrix of the required form, except that the diagonal 

Jordan blocks might not be arranged in nonincreasing order. A block 

permutation similarity, if necessary, will produce the required form. 
Finally, observe that if A is real then all the similarities used in this 


126 Canonical forms 


proof can be chosen to be real, so A is similar via a real similarity to the 
required Jordan matrix. O 


Theorem (3.1.5) essentially completes step 3 of the promised program 
for exhibiting the Jordan canonical form. Notice that if 


A * 
A 


0 A 


is an upper triangular matrix with all diagonal entries equal to \, then 
Ap =A-—MI is strictly upper triangular. If Se M, is nonsingular and 
S~'A)S is a direct sum of basic Jordan blocks J„;(0), as guaranteed by 
(3.1.5), then STAS =S~1!4)S+XJ is a direct sum of basic Jordan blocks 
Jn;(\). Steps 1 and 2, carried out in Sections (2.3) and (2.4), together with 
step 3 explicitly demonstrate the existence half of the Jordan canonical 
form theorem: 


3.1.11 Theorem. Let A € M, be a given complex matrix. There is a non- 
singular matrix Se M, such that 


In) 0 


A=S Jn (M2) S-1=SJS7! (3.1.12) 


0 
J, nk (Ax ) 
and n+, +: +n =n. The Jordan matrix J of A is unique up to per- 
mutations of the diagonal Jordan blocks. The eigenvalues \;, i=1, = k 
are not necessarily distinct. If A is a real matrix with only real eigen- 
values, then the similarity matrix S can be taken to be real. 


Proof: We have proved everything except the uniqueness assertion. If 
A, BEM, are similar, then for any scalar >e C and any exponent m= 
1,2,..., the matrices (A—)\J)” and (B—XIJ)” are also similar; in par- 
ticular, their ranks are equal. Thus, it suffices to show that the set of 
Jordan blocks (including repetitions) lying on the diagonal of a Jordan 
matrix Je M, is determined completely by the finitely many integers 
rank(J—AI)”, m=1,2,...,.n, Ne a(J). 

First consider a Jordan block J;,(») € Mx of the form (3.1.2) for some 
given pe C and m= 1. If p # 0, then rank J,()” = rank J,(u)"t! =k, 
so rank J,(m)” —rank J,(p)"*! = 0. If p =0 and m = k, then J,(0)"= 


3.1 The Jordan canonical form: a proof 127 


J,(0)"*!=0, so rank J,(0) —rank J,(0)”*!=0 again. Finally, if p=0 
and m < k, then rank J, (0) — rank J,(0)’"*! =1. 

Now let Je M, be a Jordan matrix of the form (3.1.3), let Ne o(J), 
and define r,,(\) =rank(J—dJ)” for m=1,...; set ro(\) =x. It follows 
from the preceding analysis of the case of one block that the difference 
Om) = lm—10\) —7m(A) is equal to the total number of Jordan blocks 
JN) in J of all sizes k =m and that d,,(\) =0 for all m >n. The num- 
ber of Jordan blocks in J with exact size k = m is therefore equal to 
in) — Qing A) = m1) — 2m OA) trma (A) for m=1,2,...,0. 0 


Exercise. Let A e M, have Jordan canonical form J, let X be an eigenvalue 
of A with algebraic multiplicity v, and let b, denote the number of Jordan 
blocks J, (À) of size k in J, k =1,...,n. If ra (A) =rank(A—AJI)” for m= 
l and ro(\) =n, show that: (a) 7,,(\) and b; satisfy the linear equations 


n 
rm\)=n-vt+ DY (i-m)b; m=0,1,...,n-1 
i=m+1 
(b) These equations have a unique solution. (c) The solution is bm = 
Fm—10¥) — 2% m QQ) +0 in 41 (QQ), m=1,2,...,2, where rpp OA) =7,(\) =n. 


In order to have a standard presentation of the Jordan canonical form 
(3.1.12), it is conventional to choose some ordering of the distinct eigen- 
values \;,2,...,, Of A and to present first all the Jordan blocks cor- 
responding to ),, then those corresponding to ^2, and so on. The Jor- 
dan blocks corresponding to each distinct eigenvalue are presented in 
decreasing (nonincreasing) order with the largest block first, then the next 
largest, and so on. Since multiple blocks of the same size corresponding 
to the same eigenvalue are identical, this presentation gives a uniquely 
determined Jordan canonical form once the ordering of the eigenvalues is 
given. Every similarity equivalence class of matrices in M, contains one 
and only one such Jordan canonical form. 

Although our derivation of the Jordan canonical form is an explicit 
algorithm that can, in principle, be used to compute the Jordan form of 
a given matrix, it is definitely not recommended for automatic numerical 
computation by computer. The unfortunate truth is that not only can it 
produce spurious results, but in fact there exists no numerically stable way 
to compute Jordan canonical forms. A simple example will make this clear. 


IfA.=[{ o] and e #0, then A, = SeS ' with S= |} 1 and J.=[§ 0], 
If we let e > 0, then J, > [o o> which cannot be the Jordan form of 


the nonzero matrix Ag = [i ol: In fact, Ag has lo o] as its Jordan form, 


Since the Jordan form of a matrix need not be a continuous function of 


128 Canonical forms 


the entries of the matrix, it is possible that small variations in the entres 
of a matrix will result in large variations in the entries of the Jordan 
form. There is no hope of computing such an object ina stable way, sO 
the Jordan canonical form is little used in numerical applications: ow. 

Despite this limitation, the Jordan canonical form is well wort nc f 
ing and is a rich source of insights. As a matter of general tec mqn a a 
one has something to prove about matrices it is well to consider rs tir 
can be proved for diagonal matrices and, if this is successful, t en to see 
if some limiting argument may establish the result in general (using 
fact that any complex matrix can be approximated arbitrarily c ose y : ye 
diagonalizable matrix). If this does not work, or if one pre o avoid 
an analytical argument, one might next try to prove the result for upp 

i r Jordan matrices. o l 
tis samnetimes useful to know that every matrix is similar to a matrix 
of the form (3.1.12) in which all the “41” terms in the Jordan blocks are 
replaced by e #0, and e can be taken to be arbitrarily small. 


3.1.13 Corollary. Let Ae M, bea given complex matrix, and let € zo 
be given. Then there exists a nonsingular matrix S = S(e) e M, such tha 


In Ory ©) 0 
A — s Jn (^2: €) 3 svt, 
0 Ing (dks €) (3.1.14) 
ntn t <e HAKEN 
and 
` € 

A € 0 
JO; €) = À E Mm 

0 € 


A 


If A is real with real eigenvalues and € € R, then S may be taken to be real. 


* —] * . . 
Proof: First find a nonsingular matrix Sı € Mn such that S; A S is in Jor 
dan canonical form (with a real $; if A is real and has real eigenva ues). 
Then take D, = diag(1, €, €?, -.-, €”71) and compute D; (ST'AS)De- This 
matrix is of the form (3.1.14), so $ = S(€) = Sı D; meets the requirements 
of the theorem. O 


3.2 Some observations and applications 129 


Problems 
1. Supply the computational details to prove Lemma (3.1.4). 


2. Carry out the three steps in the proof of (3.1.11) to find the Jordan 
canonical forms of 


3. Let Ae M, be a matrix with complex entries, but with only real eigen- 


values. Show that A is similar to a real matrix. Can the similarity matrix 
be chosen to be real? 


Further Readings. Our proof of (3.1.11) is in the spirit of R. Fletcher 
and D. Sorensen, “An Algorithmic Derivation of the Jordan Canonical 
Form,” Amer. Math. Monthly 90 (1983), 12-16, which has additional ref- 
erences. [Ste] discusses the Jordan canonical form from the point of view 
of numerical computations and gives examples of its sensitivity to pertur- 
bations in the entries of the matrix. [Str] presents a nice proof. 


3.2 The Jordan canonical form: some observations and 
applications 
3.2.1 The structure of a Jordan matrix. The Jordan matrix 
Fn 1) 
Ja 
J= n02) , ntt: +n =n 
0 


Tn Xx) 
(3.2.1.1) 


has a definite structure that makes apparent certain basic properties of 
the matrix and of any matrix that is similar to it. 


1. The number k of Jordan blocks (counting multiple occurrences 
of the same block) is the number of linearly independent eigen- 
vectors of J. 

2. The matrix J is diagonalizable if and only if k =n. 

3. The number of Jordan blocks corresponding to a given eigenvalue 
is the geometric multiplicity of the eigenvalue, which is the dimen- 
sion of the associated eigenspace. The sum of the orders of all 


130 


Canonical forms 


the Jordan blocks corresponding to a given eigenvalue is the alge- 
braic multiplicity of the eigenvalue. 

A Jordan matrix is not completely determined in general by a 
knowledge of the eigenvalues and their algebraic and geometric 
multiplicity. One must also know the sizes of the Jordan blocks 
corresponding to each eigenvalue. The size of the largest Jordan 
block corresponding to an eigenvalue ) is the multiplicity of \ as 
a root of the minimal polynomial (3.3.6). 

The sizes of the Jordan blocks corresponding to a given eigen- 
value are determined by a knowledge of the ranks of certain 
powers. For example, if 


compute 


J—-2I = 


p 
I 
i 
j 
i 
IO m 
a 


3.2 Some observations and applications 131 
and (J—2] )> =0. Thus, we know that 
(J-21)° =0 
rank(J—2/)* = 1 
rank(J—2/) =4 
(J—2/) is 8-by-8 


This list of numbers is sufficient to determine the block structure 
of J. The fact that (J—2/)? = 0 tells us that the largest block has 
order 3. The rank of (J—2/)* will be the number of blocks of 
order 3, so there is only one. The rank of (J—2/) is twice the 
number of blocks of order 3 plus the number of blocks of order 
2, so there are two of them. The number of blocks of order 1 is 
8—(2x2)—3=1. The same procedure can be applied to direct 
sums of Jordan blocks of any size so long as all the blocks corre- 
spond to the same eigenvalue. If J is such a direct sum corre- 
sponding to the eigenvalue \, then the smallest integer kı such 
that (J—\J)*1=0 is the size of the largest block. The rank of 
(J—-\7)"7! is the number of blocks of order kı, the rank of 
(J—1)*1~? is twice the number of blocks of order k; plus the 
number of blocks of size k;—1, and so forth. The sequence of 
ranks of (JAD, i=0,1,2,...,k;—1, recursively deter- 
mines the orders of all the blocks in J. 

The sizes of all the Jordan blocks in a general Jordan matrix 
(3.2.1.1) are determined by a knowledge of the ranks of certain 
powers. If \; is an eigenvalue of a Jordan matrix Je M,, then 
only the Jordan blocks corresponding to M; will be annihilated 
when one forms (J—),/),(J—),/)’,.... because the other 
blocks in J—),J all have nonzero diagonal entries. Eventually, 
the rank of (J—),/)* will stop decreasing (one need not con- 
sider any k >n); the smallest value of k for which the rank of 
(J—Ayl )* attains its minimum value is the order of the largest 
block corresponding to \;. This minimum value is called the 
index of the eigenvalue \;. An analysis of the ranks of the se- 
quence of powers of J—},/ is sufficient to determine the sizes 
and numbers of Jordan blocks corresponding to \,. One then 
proceeds to 2, \3, and so on and determines all the blocks in 
this same way. 


Although the preceding list of observations was made about a Jordan 
matrix J, each observation also applies to any matrix that is similar to 
J. Thus, if one is given a matrix A € M,, the Jordan canonical form of A 


132 Canonical forms 3.2 Some observations and applications 133 


importance is to the analysis of solutions of a system of ordinary differ- 
entia equations with constant coefficients. Let Ae M, be given, and 
consider the first-order initial value problem 


x’(t) = Ax(t) 


(but not the similarity that transforms A to Jordan canonical form) can 
be determined by the following procedure: 


1. Find all the distinct eigenvalues of A, perhaps by finding the 
roots of the characteristic polynomial. 

2. For each distinct eigenvalue à; of A, form (A—,1)* for k= 
1,2,..., and analyze the sequence of ranks of these matrices to 
discover the orders and numbers of all the Jordan blocks of A 
corresponding to the eigenvalue ij. 


(0) = xo given (3.2.2.1) 
where x(t) = [x1 (4), X2(2), x,(t)]’, and the prime (’) denotes differen- 
, tiation with respect to ¢. If A is not a diagonal matrix, this system of 
equations is coupled; that is, x/(t) is related not only to x,(t) but to the 
other components of the vector x(t) as well. This coupling makes the 
problem hard to solve, but if A can be transformed to diagonal (or 
almost diagonal) form, the amount of coupling can be reduced or eve 

eliminated and the problem may be easier to solve. If A = SJS~', wh 7 
is the Jordan canonical form of A, then (3.2.2.1) becomes _“ 


y(t) =Jy(t) 


This algorithm may be useful in analyzing small matrices of simple 
form by hand, but it is useless for automatic computation because deter- 
mination of the rank of a matrix is an inherently unstable process. The 


example A, = lo °] makes this clear; A, has rank 2 for all e #0, but it has 


rank 1 for e=0. 
As an example of a situation in which this algorithm is useful, con- 


sider the problem of determining the Jordan canonical form of the 


square of a Jordan block J,(0) € Mx: y(0) = yo given (3.2.2.2) 
5 001 =... 0 | where x(t)=Sy(t d y=S7! 
0 1 0 T ved y(t) and yo=S~'xo. If the problem (3.2.2.2) can be 
. o ved, then each component of the solution x(¢) to (3.2.2.1) is just a 
A= |= 1 inear combination of the components of the solution to (3.2.2.2), and 
0 0 0 the linear combinations are given by S. 
0 0 If A is diagonalizable, then J is a diagonal matrix, and (3.2.2.2) is just 


an uncoupled set of equations of the form yj(t) =, ¥,(t), which ha 

. the solution y(t) = y,(O)e*". If the eigenvalue \, is real this is asim Wi 

° exponential, and if \,=a,+ib, is complex, it is an oscillator t r 

Y(t) = Vx (0)e“*'[cos (bgt) +isin(bxt)]. ao 
If J is not diagonal, the solution is more complicated. The component 

of y(t) that correspond to different Jordan blocks in J are not cou l d 

so it suffices to consider the case in which J is a single Jordan block o 


The eigenvalues of A are all zero, A” =0 for m= [(K+1)/2] = the | 
greatest integer in (k+1)/2, and A? #0 if p=1,2,...,.m—1. The rank | 
of each power A” +1 ig 2 less than the rank of its predecessor A” for | 
p=1,2,..., m—23 A”! has rank 2 if k is even and has rank 1 if k is odd. 


Thus, the Jordan canonical form of A= J? (0) is 


J,,(0 0 
| mi ) 1,0) | if k=2m is even i 
and J 7 0 
Jm(O) 0 | . , = E E Mm 
if k=2m-—l1is odd 1 
| O  Jm-1(0) 0 N 


This observation is useful in determining whether a given matrix has a 
square root. For example, it shows that there is no matrix A € M, such 


that A7=[$ 4|- 


The system (3.2.2.2) is 


vit) =t) +t) 


Vm—1(0) = Vm 102) + Vn (1) 


3.2.2 Linear systems of ordinary differential equations. One applica- 
Yalt) =n (0) 


tion of the Jordan canonical form that is of considerable theoretical 


134 Canonical forms 


which can be solved in a straightforward way from the bottom up. Start- 
ing with the last equation, we obtain 


Y(t) =Ym(0)e™ 
so that 
Yin- (t) = Vm—1(0) + Ym (O)e™ 
This has the solution 
Y(t) = Ym(0)te™ + ¥m—1(0)e™ 
which can now be used in the next equation, which becomes 
Vin—2(t) = Vm —2(t) + ¥m(0) te” + Ym—1(O)e™ 


This has the solution 
12 
Yn—2(t) = Ym (0) > eM + Ym- (0)te™ + ¥m—2(0)e™ 


and so forth. It is clear that each component of the solution is of the form 


Ye (t) =e Gy (t) 
where q(t) is a polynomial of degree at most m— k, k=1,...,m. 
From this analysis, we conclude that for any given initial condition xo, 
the components of the solution x(t) of the problem (3.2.2.1) have the form 


x(t) = pi(t)e! + pte + -+ plte" 


where )},A2,---, Ax are the distinct eigenvalues of A and each p;(t) is a 
polynomial whose degree is strictly less than the order of the largest 
Jordan block corresponding to the eigenvalue \;. Real eigenvalues pro- 
duce pure exponential terms, while complex eigenvalues may produce 
mixed exponential and oscillatory terms. 


3.2.3 Similarity of a matrix and its transpose. Every Jordan block is 
permutation-similar to its transpose, as can be seen from the similarity 


0 1]fx 0 0 
1 4 _ 
i 0 0 alla 0 
N 
0 1 4 


3.2 Some observations and applications 135 


Therefore, if A € M, is given and A = SJS~' is its Jordan canonical form, 
then A is similar to J, which is similar to J T which is similar to A= 
(S7)-!J7(S7). The conclusion is that every complex matrix is similar to 
its transpose. From this it follows that the row rank (the maximum 
number of linearly independent rows) of a complex matrix is the same as 
its column rank (the maximum number of linearly independent columns) 
because rank is a similarity invariant. It also follows from this that A and 
A’ have the same eigenvalues, but all of these consequences are more 
easily established directly. 

It is also the case that every matrix in M,,(F) is similar, via a matrix in 
M,,(F), to its transpose for any field F; it is not necessary to assume that 
F=C. In fact, the similarity matrix may be taken to be symmetric. 


3.2.4 Commuting matrices and nonderogatory matrices. If p(t) is a 
polynomial, and if Ae M, is a given matrix, it is a useful, if obvious, 
fact that p(A) commutes with A. What about the converse? If A, Be M, 
are given and if A commutes with B, is B necessarily a polynomial in 
A? Evidently not, for if we take A=J/, then A commutes with every 
matrix and p(/) = p(1)/J cannot produce any nonscalar matrix. The 
problem is that the form of A permits it to commute with many ma- 
trices, but permits it to generate only a few matrices of the form p(A). 
To get any result, some compromise between these two forces must be 
found. 


3.2.4.1 Definition. A matrix A € M, is said to be nonderogatory if every 
eigenvalue of A has geometric multiplicity 1. 

Since the geometric multiplicity of an eigenvalue of a Jordan matrix is 
equal to the number of Jordan blocks corresponding to that eigenvalue, 
a matrix is nonderogatory if and only if corresponding to each distinct 
eigenvalue is exactly one Jordan block. A matrix A €M, is nonderog- 
atory, for example, if it has n distinct eigenvalues or if it has only one 
eigenvalue and that eigenvalue has geometric multiplicity 1. A scalar ma- 
trix is the antithesis of a nonderogatory matrix. 


3.2.4.2 Theorem. Let Ae M, be a given nonderogatory matrix. A 
matrix Be M„ commutes with A if and only if there is a polynomial p(+) 
of degree at most n—1 such that B= p(A). 


Proof: If B = p(A), then certainly A commutes with B. For the converse, 
let A= SJS~! be the Jordan canonical form of A. If BA=AB, then 
BSJS~! = SJS~'B and (S~'BS)J=J(S~'BS). If we can show that 
S“'BS = p(J), then B= Sp(J)S~'= p(SJS~') = p(A). Thus, it suffices 


136 Canonical forms 


to assume that A is itself a Jordan matrix. Since A is nonderogatory, we 
may assume that 


Tn, 1) 
Jn A2) 


0 Tn Xk) 


where M, \2,---, Àg are the k distinct eigenvalues of A. If we write Bin 
partitioned form B = (B;;), which conforms with this decomposition of A, 
then the corresponding off-diagonal blocks of AB — BA are of the form 


In, Oi) Bij — BijIn, Xj) =9, EAS 


Since the eigenvalues A; and \; are different, we can conclude [see Prob- 
lem 9 in Section (2.4)] that B;; =0 is the unique solution of these equa- 
tions. The matrix B must therefore be a block diagonal matrix 


B; 0 


with each B; € M,,. The commutativity assumption says that Bi Jn (i) = 
Ja (\;)B; for all i =1, 2,...,k. If we write Jani Oi) =A HN; with 


Ol œ 


N; = i e Mn, 


0 0 


these identities become B;N;=N,B, for i=1,2,...,k. An explicit calcu- 
lation shows that, because of the special form of Nj, each B; must be an 
upper triangular matrix of Toeplitz type (0.9.7), that is, 


[oO pO. Bho 
(i) 
B, = bi : (3.2.4.3) 
i= ' bh? 
0 2 
pÈ 
L l ad 


where the entries are constant down the diagonals. If we can con- 
struct polynomials p,(f) of degree at most n—1 with the property that 


3.2 Some observations and applications 137 
Di(In(Aj)) = 0 for all i # j, and p;(Jn,(\;)) = B;, then 
P(t) = pil) + = + p(t) 


will fulfill the assertions of the theorem. Define 


k 
gi(t)= [] (t-d,;)", degree q;(t) =n—n; 

j=l 

Jri 
and observe that q; (Jn, (à;)) =0 for all i 4 j because (Jn, OA A = 0. 
Although q;(Jn,;(4;)) is not necessarily equal to B,, it is nonsingular (be- 
cause the à; are distinct) and, like any polynomial in J,,(\;), is of the 
form (3.2.4.3). 

Since the inverse of any nonsingular matrix of the form (3.2.4.3) is of 

the same form, and since the product of any two matrices of this type is 
of the same type, the matrix 


[ai(Jn,0i)) 171 B; 


is an upper triangular matrix of Toeplitz type (3.2.4.3). Every such 
matrix can be written as a polynomial in Ja, (\,), for example, 


B; = Bt (Jn, AD -ND + BY Tn, OX) =A) + BLT, Oi) = iD 
Thus, there is a polynomial 7;(¢) of degree at most n,—1 such that 
lain OAD 17 'Bi = ri (Jn, (Mi) 
If we now set p;(t)=4q,(t)r;(t), the degree of p;(f) is at most n—1, 
Pi (In OG) = Gi Inj A) Ti Sn, As) = O71 (Tn, As) = 0 
if iy, and pin Oi) = qi In, Ai) ) Sn A) = Bi. D 


The converse of the theorem is also true, and leads to a characteriza- 
tion of nonderogatory matrices: A matrix Ae M, is nonderogatory if 
and only if every matrix that commutes with A is a polynomial in A. 


3.2.5 Convergent matrices. A matrix A e M, with the property that all 
elements of A” tend to zero as m— œ is called a convergent matrix. Such 
matrices play an important role in the analysis of algorithms in numerical 
linear algebra. If A is a diagonal matrix, then it is apparent that A is con- 
vergent if and only if all the eigenvalues of A are of modulus strictly less 
than 1, and the same reasoning extends to diagonalizable matrices. 
Because of the limiting operation involved, it is not clear how to use a 
perturbation argument to extend this result to the general case of not 
necessarily diagonalizable matrices. We may use the Jordan canonical 


138 Canonical forms 


form, however. If A = SJS! is the Jordan canonical form of A, then 
A" =SJ"”S7!, so A” +0 as m —> œ if and only if J” +0 as m > œ. Since 
J is a direct sum of Jordan blocks, it suffices to consider the behavior of 
powers of a Jordan block 


A l 0 N 0 1 0 
IQ) = nT — `. + . *, 


oO 
o yf LP [0g 


= \I+N, €e Mg, where N,=J;(0) 
Since Nz” = 0 for all m = k, we have 
m m ~ m ingm-i ~ m farym~-i 
L(V" = (M+ Ng” =D ( | ya Ne= 5 ( xni 
i= I jam—k4+1 \ # 


for all m = K. Since the diagonal elements are all X”, if J” — 0 it is neces- 
sary that A” —0, which means that |A| <1. Conversely, if |A| <1, we 
would like to prove that 


( m ) +0 as m-oo foreach /=0,1,2,...,k—1 


m-j 

But 
m \\m-i| = m(m—-1)(m-2):-(m-j+ A" < min” 
m—j JIN JN 


so it will suffice to show that m’ |\|” +0 as m — oo, An easy way to see 
this is to take logarithms and observe that 


jlogm+mlog|\| > -% 


as m =œ because log|\| <0 and (logx)/x 0 as x > by l’Hopital’s 
rule. 

This argument has made essential use of the J ordan canonical form of 
A to show that A” +0 as m >œ if and only if all the eigenvalues of A 
have modulus less than 1. Another proof, which is completely indepen- 
dent of the Jordan canonical form, is given in (5.6.12). 


3.2.6 The geometric multiplicity—algebraic multiplicity inequality. The 
geometric multiplicity of an eigenvalue of a given matrix A € M, is the 
number of Jordan blocks of A corresponding to the eigenvalue. This 


number is less than or equal to the sum of the orders of all the Jordan 


blocks corresponding to the eigenvalue. This sum is the algebraic multi- 


3.2 Some observations and applications 139 


plicity. Thus, the geometric multiplicity of an eigenvalue is not greater 
than its algebraic multiplicity; compare with (1.4.9). 


3.2.7 Diagonalizable and nilpotent matrices. A matrix A €e M, is nil- 
potent if A‘ =0 for some positive integer k. Any Jordan block Ją (à) can 
be written as J (Aà) =M + Ng, where (No) =0. Thus, any Jordan block 
is the sum of a diagonal matrix and a nilpotent matrix. 

More generally, a Jordan matrix (3.2.1.1) can be written as J= D+N, 
where D is a diagonal matrix whose main diagonal is the same as that of 
J, and N=J—D. The matrix N is nilpotent, and N k — 0 if k is the order 
of the largest Jordan block in J. 

Finally, if Ae M, is any given matrix and A = SJS -l is in Jordan 
canonical form, then A = SDS~!+SNS7~'=Ap+An, where Ap is diag- 
onalizable, Ay is nilpotent, and ApAn =AnAp because both D and N 
are block diagonal matrices with respective blocks of the same size, and 
the blocks in D are scalar matrices. 

We conclude that any Ae M, can be written as the sum of a diag- 
onalizable matrix and a nilpotent matrix in such a way that these sum- 
mands commute. 


Problems 


1. Let F={A,:a€ J} CM, be a given family of matrices, indexed by 
the index set J, and suppose there is a nonderogatory matrix Age F such 
that A, Ao = AoA, for all œ e J. Show that for every a € J there is a poly- 
nomial p,(¢) of degree at most n—1 such that A, = Pa(Ao), and hence F 
is a commuting family. 


2. Let Ae M, be given, and let \; be an eigenvalue of A. Show that the 
order of the largest Jordan block of A corresponding to the eigenvalue 
\; (the index of \;) is the smallest value of k =1, 2, ...,4—1 for which 
rank(A —\,1)* =rank(A—djI)**". 


3. If Ae M, has A‘ =0 for some k >n, show that A’ = 0 for some r sn. 
Thus, every nilpotent matrix has a vanishing power that is not greater than 
the order of the matrix. Hint: Show that 0 is the only eigenvalue of A. 
What does the Jordan canonical form of A look like? Now take powers. 


4. Let J,(0) be a given Jordan block. Use the argument at the end of 
(3.2.1) to determine the three possible Jordan canonical forms of the 
matrix J (0). 


5, Let Ae M, be nilpotent, so A“ =0 for some k. Show that the char- 
acteristic polynomial of A is pa(t) =t". 


140 Canonical forms 


6. The linear transformation d/dt: p(t) > p(t) acting on the vector 
space of all polynomials with degree at most 3 has the basis representation 


0 1 0 0 
00 2 0 
0 0 0 3 
0 0 0 0 


in the basis B= {1, £, 7, t°}. What is the Jordan canonical form of this 
matrix? 


7. What are the possible Jordan forms of a matrix A eM, such that 
£= 


8. What are the possible Jordan canonical forms for a matrix A € Me 
with characteristic polynomial pa (t) = (¢+3 MIUS 4)? 


9. Use the method described in (3.2.1) to determine the Jordan canon- 
ical form of 


pF l 
4-1) | 
1-1 


10. Verify the assertion in the proof of Theorem (3.2.4.2) that products 
of matrices of the form (3.2.4.3) have the same form. Deduce that the 
inverse of a nonsingular matrix of the form (3.2.4.3) has the same form. 
Hint: The inverse of A is a polynomial in A. 


11. Let A, Be M,,. Use the identities in the proof of Theorem (1.3.20) to 
show that the nonsingular blocks in the Jordan forms of AB and BA are 
identical. Does this mean that AB and BA are similar? If AB and BA are 
not similar, discuss how far from similar they can be. 


12. Suppose that A;,..., A, are given matrices with A;e Mn, for i= 
1,2,...,k, and suppose that J;,..., J, are their respective Jordan canon- 
ical forms. Show that the Jordan canonical form of the direct sum 


Ay 0 
Az 


0 


e Mm ntt e +ng=n 
Ak 


has (up to permutation of diagonal subblocks) the Jordan canonical form 


3.2 Some observations and applications 14] 


J 0 
h 


0 Ik 


13. Let Ae M, and B,CeM,, be given. Show that the direct sum 
AO + oe . 

o Bl € M,,, is similar to the direct sum [3 e] if and only if B is similar 

to C. 


14. Let B, Ce M,, be given. Show that the two k-fold direct sums 


B 
B 0 © 0 
; E M,,, and C 


. ‘ € Mums kzl 
0 B 0 c 


are similar if and only if B and C are similar. 
15. Let Ae M, and B, Ce M,, be given. Show that the direct sums 


A 
B 0 k 0 
. € Mr+km and C 


0 B 0 c 


EMn+km 


kzi 


are similar if and only if B and C are similar. 


16. Let AeM, have Jordan canonical form Jn ADD Jn (Ax). 
If A is nonsingular, show that the Jordan canonical form of A’ is 
In, M)@ vee OIn, AZ); that is, the Jordan canonical form of A’ is com- 
posed of precisely the same collection of Jordan blocks as A, but the 
respective eigenvalues are squared. Is a statement like this true for all 
powers A™, m= 2? Give a 2-by-2 example to show this is false if A is 
singular. Hint: If \ 40, show that the Jordan canonical form of JÈN) 


(a simple Jordan block) is J,(\7). Show that 
rank[ J (A) -AZ ]” =rank[JZ(A)—N7]"”",  m=1,2,...,k, if #0 


17. If A EM, show that rank A =rank A? if and only if the geometric 
and algebraic multiplicities of the eigenvalue \ = 0 are equal; that is, all 


142 Canonical forms 3.3 Polynomials and matrices 143 


If p(t) annihilates A, and if q(t) is a monic polynomial of minimum 
degree that annihilates A, then the degree of q(t) must be less than or 
equal to the degree of p(t). By the Euclidean algorithm, therefore, there 
exists a polynomial h(t) and a polynomial r(¢) of degree less than that of 
q(t) such that p(t) =q(t)h(t)+r(t). But 0 = p(A) = q(A)A(A)+7r(A) = 
0h(A)+r(A), so r(A) =0. If r(t) #0, we could normalize it and obtain 
a monic polynomial of degree less than that of q(t) that annihilates A. 
Since this would contradict the minimal property of q(t), we conclude 
that r(t) =0, and hence that q(t) divides p(t) with quotient A(t). If there 
are two monic polynomials of minimum degree that annihilate A, this 
argument shows that each divides the other; since the degrees are the 
same, one must be a scalar multiple of the other. But since both are 
monic, the scalar factor must be +1 and they are identical. UO 


the Jordan blocks corresponding to à = 0 (if any) in the Jordan canonical 
form of A are 1-by-1. 


Further Readings. For a proof of the fact that the similarity matrix 
between a given matrix and its transpose may always be taken to be sym- 
metric, see 0. Taussky and H. Zassenhaus, “On the Similarity Trans- 
formation between a Matrix and Its Transpose,” Pacific J. Math. 9 (1959), 
893-896. See [HJ] for a proof of the converse to Theorem (3.2.4.2) men- 
tioned at the end of Section (3.2.4). 


3.3 Polynomials and matrices: the minimal polynomial 


If p(t)= tk +a it! tay ot 7 +- +at +a is a given polynomial, 
then one can always define 

3.3.2 Definition. Let Ae M,„ be given. The unique monic polynomial 
ga(t) of minimum degree that annihilates A is called the minimal poly- 
nomial of A. 


p(A) = Af + a,_,A*! + ay-2A ©? + -+a A+ aol 


for any Ae Mp. There is an important interplay between polynomials 
and matrices. The vital role of the characteristic polynomial has already 
been observed, but there are other polynomials associated with a square 
matrix. One of these is the minimal polynomial. 

The Cayley-Hamilton theorem (2.4.2) guarantees that for each A € M, 
there is a polynomial (the characteristic polynomial) p4(t) of degree n 
such that pa(4)=0. A polynomial whose value is the 0 matrix at A is 
said to annihilate A. There may also be a polynomial of degree n—1 
which annihilates A, or one of degree n— 2, but it is clear that, since there 
are only finitely many possibilities, for each A e M, there is a polynomial 
of minimum degree that annihilates A, and this minimum degree is at 
most n. If p(4)=0, then cp(A) =0 for any ceC, so it is clear that we 
may always normalize a nontrivial annihilating polynomial so that the 
coefficient of the highest-order term is +1. A polynomial whose highest- 
order term has coefficient +1 is said to be monic. Notice that a monic 
polynomial cannot be identically zero. 


3.3.3. Corollary. Similar matrices have the same minimal polynomial. 


Proof: If A,B,SEM, and if A=SBS~', then gp(A) = qe(SBS~') = 
Sqp(B)S -!=0, so the degree of qg(t) is not less than the degree of 
ga(t). But B=S~'AS, so the same argument shows that the degree of 
ga(t) is not less than the degree of q(t). Thus, these two monic poly- 
nomials have the same minimal degree and both annihilate A, so they 
must be identical by Theorem (3.3.1). D 


3.3.4 Corollary. For every AeM,, the minimal polynomial ga(?) 
divides the characteristic polynomial p(t). Moreover, g4(\) =0 if and 
only if A is an eigenvalue of A, so every root of p(t) =0 is a root of 


ga(t) =0. 


Proof: Since p,4(A) = 0, the fact that there is a polynomial /(t) such that 
pa(t)=h(t)qa(t) follows from the theorem. This factorization makes it 
clear that every root of g,4(f) =0 is a root of pa(t)= 0, and hence every 
root of g4(t) =0 is an eigenvalue of A. If X is an eigenvalue of A, and if 
xO is an associated eigenvector, then Ax = \x and 0 = q4(4)x =4q4 (A)x, 
soga(A)=0. O 


This last corollary shows that if the characteristic polynomial p,(t) 
has been completely factored as 


3.3.1 Theorem. Let Ae M, be given. There exists a unique moni 
polynomial q4(t) of minimum degree that annihilates A. The degree 
of this polynomial is at most n. If p(t) is any polynomial such tha 
p(A) =0, then g4(t) divides p(t). . 


Proof: The characteristic polynomial is an example of a polynomial of | 
degree n that annihilates A, so there is a minimum positive integer msn | 
such that there exists a monic polynomial q(t) of degree m with g(A) =9. » 


144 Canonical forms 


l<s;<n, SiS t + +S, =n (3.3.5a) 


m 
pa(t)= TI] C- 
i=l 
with \j, Ao, ---; Am distinct, then the minimal polynomial q4(¢) must have 
the form 
m 
qa) = I C- y)”, 
j=l 
In principle, this gives an algorithm for finding the minimal polynomial 
of a given matrix A: 


(3.3.5b) 


Ler Zs, 


1. First compute the eigenvalues of A, together with their algebraic 
multiplicities, perhaps by finding the characteristic polynomial 
and factoring it completely. By some means, determine the fac- 
torization (3.3.5a). 

2. There are finitely many polynomials of the form of the product 
in (3.3.5b). Starting with the product in which all z; =1, deter- 
mine by explicit calculation the one of minimal degree that anni- 
hilates A. This will be the minimal polynomial. 


Numerically, this is not a good algorithm if it involves factoring the char- 
acteristic polynomial of a large matrix, but it can be very effective for 
hand calculations involving small matrices of simple form. Another 
approach to computing the minimal polynomial that does not involve 
knowing either the characteristic polynomial or the eigenvalues is out- 
lined in Problem 5 at the end of this section. 

There is an intimate connection between the Jordan canonical form of 
A and the minimal polynomial of A. Suppose A = SJS~' is the Jordan 
canonical form of A, and suppose first that 


A l 0 

J= L 1 €E M,, 

0 A 
is only a single Jordan block. The characteristic polynomial of A is 
(t—X)", and since (J-A D)“ <0ifk<n, the minimal polynomial is also 
(t—d)”. If 

J= In) 0 | eM, 
0 Jm A) 


with 1; = Mm, then the characteristic polynomial of J is still (¢—)", 
but now (J— A\Z)" =0 and no lower power vanishes. The minimal poly- 
nomial is therefore (t—\)"!. If there are more blocks, the result is the 


3.3 Polynomials and matrices 145 


same: The minimal polynomial of J is ({—)’, where r is the order of the 
largest Jordan block corresponding to i. If J is a general Jordan matrix, 
the minimal polynomial must contain a factor (t — \,)” for each distinct 
eigenvalue \;, and r; must be the order of the largest Jordan block corre- 
sponding to \;; no smaller power will annihilate all the Jordan blocks cor- 
responding to \;, and no greater power is needed. Since similar matrices 
have the same minimal polynomial, we have proved the following theorem. 


3.3.6 Theorem. Let Ae M, be a given matrix whose distinct eigen- 
values are ^i, A2, -3 Am. The minimal polynomial of A is 


m 
qa(t)= I (t—hj)"' (3.3.7) 
j= 
where r; is the order of the largest Jordan block of A corresponding to 
the eigenvalue \;. 

In practice, this result is not very helpful in computing the minimal 
polynomial since it is usually harder to determine the Jordan canonical 
form of a matrix than it is to determine its minimal polynomial. After 
all, if only the eigenvalues of a matrix are known, its minimal polynomial 
can be determined by simple trial and error. There are important theo- 
retical consequences, however. Since a matrix is diagonalizable if and 
only if all its Jordan blocks have order 1, a necessary and sufficient con- 
dition for diagonalizability is that all z; = 1 in (3.3.7). 


3.3.8 Corollary. Let Ae M, be a given matrix whose distinct eigen- 
values are \y, A2,---, Am. Then A is diagonalizable if and only if q(4)=0, 
where 


q(t) = (t— Ai) (t= 2) (LA) (3.3.9) 


This criterion is actually useful for determining if a given matrix is 
diagonalizable, for if one knows the eigenvalues of a given matrix, it is 
easy to form the polynomial (3.3.9) and see if it annihilates A. If it does, 
it must be the minimal polynomial of A, since no lower-order polynomial 
could have as roots all the m distinct eigenvalues of A. It is sometimes 
useful to have this result formulated in several equivalent ways: 


3.3.10 Corollary. Let A e M, be given. Each of the following is a neces- 
sary and sufficient condition for A to be diagonalizable: 


(a) The minimal polynomial g(r) has distinct linear factors. 
(b) Every root of qg4(t) =0 has multiplicity 1. 
(c) For all ¢ such that g4(¢) =0, the derivative g(t) #0. 


146 Canonical forms 


We have been considering the problem of finding, for a given matrix 
AéM,, a monic polynomial of minimum degree that annihilates A. But 
what about the converse? Given a monic polynomial 


p(t) =t +a," tay ot" 7 +- Fail +a 


is there a matrix A for which p(t) is the minimal polynomial? If so, the | 


size of A must be at least n-by-n; it is not hard to find such a matrix 
AéM,,. Consider the matrix 


0 — ado 
10o 0 , 
A= 1 eM, (3.3.12 
0 0 -a,~2 
1 ~an] 
and observe that 
le =e; = A°e, 
Ae, = e, = Ae, 
Ae, =e; =A’e, 
Ae; =e, = Are, 
A€n— 1 = €n =A"~'e, 
Aen = — An —1€n ~ In —2€n-1— t — ME2— Akl 
= —a, A"e; — ay 2A" 7e,—- tte — a; Ae; — aoe = A’e; 


=[A"—p(A)]le 
Thus, 
D(A)e; = (aoe + a Ae, +aA’e peee +a„-14"7'e,)+A"e 
=[p(A)—A"Je,+[A" — p(A) Je; =0 


Furthermore, p(A)ex = p(A)A*~!e, = A*—'n(A)e = A‘-'9 =0 for each 


k=1,2,...,n. Since p(A)e,=0 for every basis vector €g, we conclude 
that p(A)=0. Thus p(t) is a monic polynomial of degree n that anni- 


hilates A. If there were a polynomial q(t) = t+ Bt be EDL HD 


of lower degree m <n that annihilates A, then 
0=q(A)e, =A'e, + Dm—A” 'e +--+ +b Ae + doe 
= Em+1 +Öm-1E€mt+ t +b1e2 + does =0 


(3.3.11) 


3.3 Polynomials and matrices 147 


which would imply that the basis vector e, +1 is linearly dependent on the 
basis vectors €j, @2,..., €m. Since this is impossible, we conclude that p(t) 
is the unique monic polynomial of minimum order that annihilates A. 
Moreover, since p(t) has degree n, Ae M,, and the characteristic poly- 
nomial p4(t) is a monic polynomial of degree n that also annihilates A, 
(3.3.11) must be the characteristic polynomial of (3.3.12). 


3.3.13 Definition. The matrix (3.3.12) is known as the companion 
matrix of the polynomial (3.3.11). 


We have proved the following: 


3.3.14 Theorem. Every monic polynomial is both the minimal poly- 
nomial and the characteristic polynomial of its companion matrix. 

Later, we shall develop methods to determine regions that contain the 
eigenvalues of a matrix. Since the zeroes of a polynomial are the eigen- 
values of its companion matrix, these methods can be used to locate the 
zeroes of a polynomial. See Section (5.6). 

If AEM, is a given matrix, one can compute the characteristic poly- 
nomial p4(ft) and the companion matrix (3.3.12) of the polynomial 
pa(t). If A is similar to this companion matrix, then (since similar ma- 
trices have the same minimal polynomial) it follows from (3.3.14) that 
the minimal polynomial g4(t) of A must be identical to the charac- 
teristic polynomial p,(t). This will not be the case in general, but if 
AeéM,, is a matrix whose minimal polynomial g,4(t) and characteristic 
polynomial p4(t) are identical, then the Jordan canonical form (3.1.12) 
of A must contain exactly one Jordan block for each distinct eigen- 
value. The size of each Jordan block is equal to the multiplicity of the 
corresponding eigenvalue as a zero of the characteristic (minimal) poly- 
nomial of A. But the Jordan canonical form of the companion ma- 
trix of the polynomial p,(t) has exactly the same Jordan block struc- 
ture, and hence it must be similar to A. This argument is a proof of the 
following: 


3.3.15 Theorem. A matrix A e M, is similar to the companion matrix of 
its characteristic polynomial if and only if the minimal and characteristic 
polynomials of A are identical. 


Exercise. Show that Ae M, is similar to the companion matrix of its 
characteristic polynomial if and only if A is nonderogatory. 


148 Canonical forms 3.3 Polynomials and matrices 149 


(e) If the kth step of the Gram-Schmidt process produces the vector 


Problems 
AgUgt ay vy + > + Of 1 k-11 = 0, show that 


1. Let A, Be M; be nilpotent. Show that A and B are similar if and only r- 5 
if A and B have the same minimal polynomial. Is this true in M4? (Qo Up + Oy Uy + +++ Hak- Uk-1) = Aol + aAta,A2+ + +o, A‘! =0 
= k=] 2 . 
2. Suppose A e M, is given and suppose the distinct eigenvalues . minimal annat qA ee oe a + ` o7 ant” + at + a&o)/&æg-ı is the 
hy, A2,- Àm Of A are known. Use (3.3.6) to show that the minimal poly i y k-17 v: 
nomial (3.3.7) is determined by the following algorithm: For each i= | 
1,2,..., m compute (A—d;,1)* for k=1,2,...,n. Let r; be the smallest 
value of k for which rank (A —),I)* =rank(A -X DY. This number r; | 


is known as the index of the eigenvalue \;. 


6. Carry out the computations required by the algorithm in Problem 5 


to determine the minimal polynomials of [ i |. [ o i |, and lo aE 


0 
7. Consider A= F | and B= [3 | to show that the minimal poly- 


nomials of AB and BA need not be the same. The characteristic poly- 
nomials of AB and BA are the same, however. Explain why there is this 
difference between the characteristic and minimal polynomials. 


3. A matrix A € M, is idempotent if A? =A. Use (3.3.10) to show tha 
every idempotent matrix is diagonalizable. Hint: Show that f*-t= 
¢(t—1) annihilates A. What is the minimal polynomial of A? What can 
you say if A is tripotent (A? =A)? What if A‘ =A? 8. Lo iS Man Peak and let q;(t) denote the minimal poly- 
nomial of each A,. ini : . 
4. If AeM, has A‘ =0 for some k >n, show that A’ = 0 for some r <n i Show that the minimal polynomial of the direct sum 
Thus, every nilpotent matrix has a vanishing power that is not greate 
than the order of the matrix. Hint: If p(t)= t* annihilates A, what doe 


(3.3.1) say about the minimal polynomial? 


Ax 


5. Show that the following application of the Gram-Schmidt proces 
permits one to compute directly the minimal polynomial of a given ma 
trix Ae M,, without knowing either the characteristic polynomial of A o 
any of its eigenvalues. 

(a) Let the mapping T: M, > C’”’ be defined as follows: For any Ae 
M, partitioned according to columns as A = [a a2... a,], let T(A) denote 
the unique vector in C”? whose first n entries are the entries of the firs 
column a, whose entries from n+1 to 2n are the entries of the second 
column a, and so forth. Show that this mapping T is an isomorphism 
(linear, one-to-one, and onto) of the vector spaces M, and Cc”, 

(b) Consider the vectors 


is the least common multiple of q(t), Q2(t), ..., q(t). This is the unique 
monic polynomial of minimum degree that is divisible by each q;(f). 
Notice that this argument gives a different proof for Lemma (1.3.10). 


9. If Ae Ms has characteristic polynomial pa(t) = (t—4)? (+6)? and 


minimal polynomial ¢4(¢) = (t—4)? (¢+6), what is the Jordan canonical 
form of A? 


10. Show by direct computation that the polynomial (3.3.11) is the char- 
acteristic polynomial of the companion matrix (3.3.12). Hint: Use co- 
factors to compute the determinant. 

vo = TU), v = T(A), v = T(A’),..., v = T(A’),«.- 11. Sometimes one sees the companion matrix of the polynomial (3.3.11) 


in C”? for k=0,1,2,...,. Use the Cayley-Hamilton theorem to show defined to be 


that {vo, Uj,...,U,} is a dependent set. Gy apa ene ~ay 7 
(c) Apply the Gram-Schmidt process to the set {vg, Vi, ..., Un} in the l Q an 0 0 1 0 

given order until it stops by producing a first zero vector. Why must a : a 

zero vector be produced? l or 0 0 ! 
(d) If the Gram-Schmidt process produces a first zero vector at the kth 0 

step, argue that k—1 is the degree of the minimal polynomial of A. 1 0 TO =a ae Tân- 


150 Canonical forms 3.4 Other canonical forms and factorizations 151 


Show that both of these matrices share with (3.3.12) the property that - 
(3.3.11) is both the minimal and characteristic polynomial of the matrix. | 


any eigenvalue ) is the same as the structure of the Jordan blocks corre- 
sponding to the conjugate eigenvalue À. Thus, all the Jordan blocks of all 
sizes (not just the 1-by-1 blocks) corresponding to nonreal eigenvalues 
occur in conjugate pairs of equal size. 

For example, if \ is a nonreal eigenvalue of the real matrix A, and if 
hA) appears in the Jordan canonical form of A with a certain multiplic- 
ity, Ja(Ñ) must also appear with the same multiplicity. The block matrix 


12. Show that there is no real 3-by-3 matrix whose minimal polynomial 
is x? +1, but that there is a real 2-by-2 matrix as well as a complex 3-by-3 - 
matrix with this property. Hint: Use (3.3.4). 


13. Although similar matrices have the same characteristic and minimal | 
polynomials, show that two matrices of order 4 or more can have the 


same minimal and characteristic polynomial without being similar. Hint: ` l 0 0 
Consider Yo 0 l | 0 A10 0 
0 Ah Hi (3.4.1) 
0 110 0 0 110 0 2%) olay 
o oio o | q | 9.010 9 0 A 
00:0 1 0 0'0 0 is permutation-similar (interchange rows and columns 2 and 3) to the 
i i : 
0 0 | 0 0 0 0 0 0 block matrix 
Show that 4 is the minimum order for which this can occur. A 0 11 0 
Sp . 0 410 1 DN) 1 
14. If A, Be M, are similar, and if p(t) is a polynomial, then p(A) = 0if DOA 0 = 0 | 
and only if p(B) =0. Use the example in the preceding problem to show 0 0 0x D(X) 


that it is possible to have p(A) = 0 if and only if p(B) = 0 for every poly- 


nomial p(t) even if A and B are not similar. How can this occur? where 


15. Let Ae M, be a given matrix, and let P(A) ={p(A): p(t) is a poly- 
nomial}. Show that P(A) is a subspace of M,, and that it is even a sub- 
algebra of M, [P(A) is closed under products]. Show that the dimension 
of P(A) is the degree of the minimal polynomial of A. 


rn 0 
Dos | p | EM and IeM, 


In general, any Jordan matrix of the form 


Ki 0 
0 A) 


is permutation-similar to the block matrix 


16. If A, B eM, have the same characteristic polynomial and the same l € Mox (3.4.2) 


minimal polynomial, and their minimal polynomial is the same as their 
characteristic polynomial, show that A and B are similar. Use this fact to 


show that the various alternate forms for the companion matrix noted in D(x) I 
Problem 11 are all similar to (3.3.12). DI) I 

e € Mk 
3.4 Other canonical forms and factorizations 0 E£ I 


In addition to the Jordan canonical form, there are several other matrix | D(X) 
factorizations that can be useful in various circumstances. 

We consider first a variant of the Jordan canonical form (3.1.12 
when the matrix A has only real entries. In this case, all the nonrea 
eigenvalues must occur in conjugate pairs. Moreover, if A is real, the 
rank(A—)1)* =rank(A -AT )* =rank(A—XJ)" for all \e C and all k 
1,2,..., and hence the structure of the Jordan blocks corresponding to! 


with k blocks D(\) on the main diagonal and k—1 2-by-2 identity ma- 
trices on the superdiagonal. 
Each 2-by-2 diagonal block D()) is similar to a real 2-by-2 matrix 


_ b 
SD(\)S7! = l A = C(a, b) (3.4.3) 


—b 


152 Canonical forms 


where \=a+ib, a,beR, and S= |i “||. Thus, every block pair o 
conjugate 2-by-2 Jordan blocks (3.4.1) with nonreal \ is similar via [3 H 
to a real 4-by-4 block of the form 


a bi 1 0 
-b ai 0 1 | [C(a,b) I 
0 0} ab =| 0 Clad) 
0 0'-b a 


In general, every block pair of conjugate k-by-k Jordan blocks (3.4.2 
with nonreal is similar to a real 2k-by-2k block of the form 


C(a, b) I 


C(a,b 
C(a, b) = (a, b) I (3.4.4 


0 C(a, b) 
These observations lead us to the real Jordan canonical form. 
3.4.5 Theorem. Each real matrix A e M,,(R) is similar to a block diag 
onal real matrix of the form 
Cn (a, by) 
Cn, (02, b2) 0 


Cn, (p> bp) 
Jn q) 


where Ag = ag + ib; is a nonreal eigenvalue of A for k =1,2,..., P, a, an 
bx are real, and \g, --., A, are the real eigenvalues of A. Each real bloc 
triangular matrix Cn, (ag, Dg) € Mon, is of the form (3.4.4) and corre 
sponds to a pair of conjugate Jordan blocks Jn, (Ak), Jng Ax) € May wit 
nonreal Ax in the Jordan canonical form (3.1.12) of A. The real Jorda 
blocks Jn,(A,) in (3.4.6) are exactly the Jordan blocks in (3.1.12) wit 
real NK. 

We have deduced the real Jordan form of a real matrix from its gen 
eral (complex) Jordan canonical form (3.1.12). This approach has th 


3.4 Other canonical forms and factorizations 153 


advantage of showing exactly how the sizes and numbers of the real 
blocks Cn, (ax, bk) are related to the complex Jordan block structure of 
A. The disadvantage of this approach, however, is that it is not evident 
that the similarity matrix that transforms A into (3.4.6) can be chosen to 
be real. 

In fact, if A is real, there is always a rea/ nonsingular matrix S such 
that S~'AS is in real Jordan form (3.4.6). One can prove this by following 
the three steps in the proof of the Jordan canonical form theorem in 
(3.1), starting with the real form (2.3.4) of Schur’s triangularization 
theorem instead of the complex form (2.3.1). In steps 2 and 3 one can 
mimic the arguments in the complex case to show that one can use a real 
similarity in each case to reduce to modified triangular or Jordan diag- 
onal blocks in which there may be real 2-by-2 blocks C(a, b) of the form 
(3.4.3) on the main diagonal. 

The complex Jordan canonical form (3.1.12) is a direct sum of upper 
triangular matrices, and the real Jordan form (3.4.6) is a direct sum of 
Hessenberg or “almost upper triangular” matrices since each 2-by-2 real 
block C(a, b) has one entry below the main diagonal. 

It is also possible to develop canonical forms that are a direct sum of 
companion matrices. These forms have some appeal for complex ma- 
trices but have the advantage that they are valid for fields other than C, 
for which the Jordan canonical form is not available. 

Let A e M, be a given matrix, and let its Jordan canonical form be 
(3.1.12). Group together all the Jordan blocks corresponding to each dis- 
tinct eigenvalue. From each group, select a Jordan block of largest’order 
and remove it from the group. Let B, denote the direct sum of all these 
removed blocks. There will be as many direct summands in B; as there 
are distinct eigenvalues of A. Now select from the remaining blocks in 
each group a Jordan block of largest order and remove it from the 
group. Let B, denote the direct sum of all these blocks. There may be 
fewer direct summands in B, than in B, because some group of blocks 
may now be empty; that is, some eigenvalue of A may have only one 
Jordan block associated with it. Continue this process to form direct 
sums B,, B2, B;,..., B; until all the groups of Jordan blocks are empty. 
The sizes of B, are monotone nonincreasing. Then B}®B.@ --- OB, is 
permutation-similar to the original Jordan form (3.1.12) of A. 

Because of the way the direct sums B; have been constructed, the min- 
imal and characteristic polynomials of each B; are the same. In fact, the 
characteristic (minimal) polynomial of B, is exactly the minimal poly- 
nomial of A. Thus, each B, is similar to the companion matrix of its 
characteristic (minimal) polynomial by (3.3.15). 

The characteristic (minimal) polynomials of the matrices B; are known 


154 Canonical forms 


as the invariant factors f,(t) of A. Notice that their degrees are monotone 
nonincreasing and that each f,.;(t) divides f,(t) for k=1,2,...,5—1. 
The first invariant factor /,(¢) =qza,(¢) is the minimal polynomial of A, 
and the product of all the invariant factors is the characteristic poly- 
nomial of A. The invariant factors are determined in a definite way by 
the Jordan block structure of A, which is determined by the eigenvalues 
\; of A and the sequence of ranks of powers of (A—),/). Thus, similar 
matrices will have the same invariant factors and, since the invariant 
factors also determine the Jordan block structure of A, two matrices with 
the same invariant factors must be similar. Thus, the sequence of invariant 
factors of a matrix (which includes the minimal polynomial and deter- 
mines the characteristic polynomial) is a complete set of polynomial sim- 
ilarity invariants: Two matrices A, B e M, are similar if and only if their 
invariant factors are identical. 

Another way to characterize the invariant factors of A is to define 
A(t) =(t-\1) +++ (f—Am)/™ to be the minimal polynomial of A. Now 
delete from the Jordan form of A one Jordan block corresponding to 
each factor (¢—,)"' of f(t) (these are just the Jordan blocks that com- 


prise B,) and let fo(t) = (t — M) -++ (£ — Am)?” be the minimal polynomial 


of the remaining Jordan form. Now delete one block corresponding to 


each factor (t—,)*' and let f3(t) be the minimal polynomial of what. 


remains, and so on. The invariant factors f(t) are just the minimal 
polynomials of a series of successively deflated matrices in which certain 
Jordan blocks are removed at each step. 

The characterization of similar matrices in terms of invariant factors 


is conceptually pleasant, since it shows clearly why the minimal and char- 


acteristic polynomials are generally insufficient to distinguish similarity, 


but it really adds nothing to the criterion we already know: Two matrices 


are similar if and only if their Jordan canonical forms are the same. 
On the other hand, this characterization leads to a new canonical form 


for A known as the rational form because the invariant factors can be. 


computed using only rational operations on the entries of the matrix A. 


3.4 Other canonical forms and factorizations 155 


If Ae M,(R) is a given real matrix, then A is similar (by a possibly 
complex similarity transformation) to the direct sum AORO @B,, 
which, in turn, is similar to a direct sum of the companion matrices of its 
invariant factors (the characteristic polynomials of the B, terms, k= 
1,...,5). The Jordan blocks of A that correspond to nonreal eigenvalues 
must occur in conjugate pairs. Therefore, if a block corresponding to a 
nonreal eigenvalue occurs in any B,, its conjugate must also occur in the 
same B,, k=1,...,s. Thus, each B; has a real characteristic polynomial, 
and the form guaranteed by (3.4.7) is a real matrix. This form, the 
rational form for real matrices, may actually be achieved by a real sim- 
ilarity, for which we omit a proof. 


3.4.8 Theorem. Every real matrix A e M,(R) is similar over R to a 
direct sum of companion matrices of real monic polynomials p(t), 
D(t), ..-) Ps(t) in which each p;.,,(t) divides p,(t) for k=1, 2,...,5—1. 
The polynomial p, (t) is the minimal polynomial of A over R, the product 
p(t): ps(t) is the characteristic polynomial of A, and each p(t) is an 
invariant factor of A over R. The polynomials p(t) are uniquely deter- 
mined, so two real matrices are similar over R if and only if they have the 
same invariant factors. 

We emphasize that a theorem of the same form is true over the field Q 
of rational numbers or any other field. The rational form gets its name 
from the fact that reduction of a matrix 4 e M,(E) to the stated form 
can, in principle, be accomplished by finitely many rational computa- 


tions on the entries of A that stay within the field F. Thus, if F is the field 


of rational numbers, only similarities with rational entries and poly- 
nomials with rational coefficients are used. 

A different canonical form involving companion matrices can also be 
derived from the Jordan canonical form (3.1.12). Observe that every 
individual Jordan block has the property that its minimal and charac- 
teristic polynomials are identical. Thus, each Jordan block J,,(X;) is sim- 


ilar to the companion matrix of its characteristic polynomial (f—,;)"/. 


The whole Jordan canonical form is therefore similar to a direct sum of 
companion matrices of these polynomials (z —\;)™"; these polynomials 
are known as the elementary divisors of A. Notice that there are gen- 
erally more direct summands in this way of factorizing A than in the 
rational form; each invariant factor may yield several elementary divi- 
sors. The product of all the elementary divisors of A is the characteristic 
polynomial of A. 

If Ae M,(R), compute its Jordan form and elementary divisors over 
Cand notice that they must occur in conjugate pairs. If blocks Ja, (À) and 
~ Jn,(X) are combined as a direct sum, the resulting block has the real poly- 


3.4.7 Theorem. Every matrix A € M, is similar to the direct sum of the. 
companion matrices of its invariant factors. 

Since we already have the Jordan canonical form, the rational form. 
(3.4.7) for complex matrices may not seem to have any advantages. The 
reason for introducing it is that a version of it is true over every field, not 
just the complex numbers. Any matrix over a field F is similar over F to 
the direct sum of the companion matrices of its invariant factors, which 
are uniquely determined polynomials with coefficients from F. We illus-. 


trate this for the real field. 


156 Canonical forms 


nomial (¢—))”"'(¢—X)"" as its characteristic and minimal polynomial, and 
hence it is similar to the real companion matrix of [t?—(2Red)t+{d|7]” 
The latter polynomial is a real elementary divisor of A. Powers of real lin- 
ear factors also occur as elementary divisors for each real eigenvalue of A. 

The canonical form associated with elementary divisors is, with inevi- 
table confusion, usually called the rational canonical form. 


3.4.9 Theorem. Every matrix A e M,(R) is similar over R to the direct 
sum of companion matrices of its (real) elementary divisors. 

The same sort of result is true over any field F: Every A e M,(F) is 
similar over F to the direct sum of companion matrices of its elementary 
divisors, which are polynomials with coefficients from F. 

As an example, consider the matrices 


— 0 —9 4 0 
A=) A= i ‘|: Ay = f o) Aa = f A 
and let A= 4: 04:04, QA; DAE Mo. Then the rational canonical 
form of A over R is A = A; 04:04:04 O[4]1® [4], and the elementary 
divisors are x —1, (x—2)?, x? +9, x? +9, x—4, x—4. Over C, the rational 
canonical form of A is AGA O3] O[31]O[—-3/1@O[-3/] O[4] GO [4], 
and the elementary divisors are x— 1, (x—2)’, x —3i, x —3i, X+3i, X+:3i, 
x—4, x—4. The rational form of A over R is (the companion matrix of 
A ®A2@A;30[4]) (the companion matrix of A;@[4]) and the invari- 
ant factors are 


f(t) = (t-1) (¢—2)° (07+ 9) (t- 4) and = fa(t) = (#7 +9) (t-4) 


Over C, the rational form of A is (the companion matrix of A;\®A.@ 
(3i]@®[—37]@[4]) @ (the companion matrix of [3/]@[—3/] @[4]). Notice 
that the two direct summands and the invariant factors are the same 
whether A is thought of as a matrix in M,,(R) or M,(C). This is not true | 
of the rational canonical form and the elementary divisors. See Problems ` 


(b) 


(d) 


(e) 


(£) 


(g) 


3.4 Other canonical forms and factorization 157 


where Ge M,, is a (complex) symmetric matrix (G =G T) and 
Q e M, is a (complex) orthogonal matrix (QQ = /). 

The singular value decomposition: Every A € M, can be written 
as A=VEW*, where V,WeM, are unitary, and E EM, isa 
diagonal matrix with nonnegative main diagonal entries and the 
rank of © is the same as the rank of A. See (7.3.5). 

The triangular factorization: Every Ae M, can be written as 
A=URU*, where Ue M,, is unitary and Re M, is upper trian- 
gular. Every real matrix A e M,,(R) can be written as A = ORQ', 
where QeM,(R) is orthogonal and Re M,(R) is an upper 
Hessenberg matrix with a special structure. See (2.3.5). 

Every Hermitian matrix A e M, can be written as A = SI (A)S*, 
where Se M, is nonsingular and J(A)e M, is a diagonal matrix 
with +1, —1, or 0 main diagonal entries. The number of +1 
(—1) entries in J(A) is the same as the number of positive 
(negative) eigenvalues of A; the number of 0 entries is equal to 
n—rank A. See (4.5.8). 

Every normal matrix A e M, can be written as A = UAU * where 
UeM, is unitary and Ae M, is a diagonal matrix whose main 
diagonal entries are the eigenvalues of A. Every real normal 
matrix A € M,,(R) can be written as A = QDQ", where Qe M,,(R) 
is orthogonal and De M,,(R) is a block diagonal matrix with a 
special structure. See (2.5.8). 

Every matrix Ae M, such that A=A’ can be written as A= 
SK(A)S", where Se M, is nonsingular and K(A) eM, is a diag- 
onal matrix whose main diagonal entries are +1 or 0 and whose 
rank is equal to the rank of A. See (4.5.12). 

Every matrix Ae M, such that A=A’ can be written as A= 
UZU", where Ue M; is unitary and È is a diagonal matrix with 
nonnegative main diagonal entries. The rank of ¥ is equal to the 
rank of A. See (4.4.4). 


2 and 3 at the end of this section (h) Every unitary matrix Ue M, can be written as U = Qe" and every 
i . _ ApiF 
We do not use the real Jordan form, the rational form, the rational — Complen | orihogonal matrix P € M, can be wr itten as P= Qe 
canonical form, invariant factors, or elementary divisors in any essential where Q, E, ae y (R), ro real orthogonal (QQ'=1), Eis real 
way in the rest of the book. We have discussed them here only because of | Deena ( a ye an . Is real s een (E =-F'). 
their historical importance and their necessity when one does matrix (i) “very matrix A € M, can be written as A = UEU ST, where S 
analysis over fields other than C is nonsingular, U is unitary, and E is a diagonal matrix with 
There are many other useful canonical forms and matrix factorizations nonnegative main diagonal entries. See (4.4.10). 
Problems 


(a) The polar decomposition: Every A e M,, can be written as A= 
PU, where P e M, is a positive semidefinite matrix whose rank is 
the same as that of A, and Ue M, is a unitary matrix. See (7.3.3) 
Every nonsingular matrix A €e M, can also be written as A = GO 


1, Compute the minimal and characteristic polynomials, the invariant 
factors, elementary divisors, rational form, and rational canonical form 
over R and C for 


158 Canonical forms 3.5 Triangular factorizations 159 
0 0-1 , then det A = ay; a22 +++ Apn XO and back substitution is used: ap, nXn = By 

10 0 and cos 6 sin ô determines Xn; Qp—1,n—1Xn—1+@n—1,nXn = Dy—1 is then one equation in 

1o 0 —sin@ cosé one unknown, which determines x,—;; and, in general, each of the 


sequence of equations 


2. Let Ae M,(R). Suppose q(t) is the minimal polynomial of A ove 
R and f(t) is the minimal polynomial of A over C. Why is degree f(t) < 
degree q(t)? Why must f(t) divide q(t)? Show that f(t) = q(t) by con 
sidering f(t) =pi(t)+ip2(t), where p;(t) and p2(t) have real coefficients 
Why is p1(A) = p2(A) = 9? 


3. Use Theorem (3.4.7) to show that if A, Be M,,(F) and if F is a sub 
field of C (F =R or Q, for example), then A and B are similar over Fi 
and only if they are similar over C. Hint: Show that the rational form o 
A over F is the same as the rational form of A over C, and similarly fo 
B. How does this result generalize Problem 2? 


a 
ajjxXj = bis i=n,n—1,n—2,...,2,1 
j=i 
is one equation in one unknown (once X;+1, --.,X, have been determined), 
which determines xj. 


Exercise. Count the number of scalar multiplication and division opera- 
tions necessary to solve Ax = b, if A e M, is nonsingular and upper trian- 
gular, if one uses back substitution. 


Exercise. Describe forward substitution as a solution technique for 
Ax=b if Ae M, is nonsingular and lower triangular. 


4. Let Ae M,(R), and suppose A* = —I. Show that n must be even, an 
that there is a real nonsingular matrix Se M, such that 


STAS = 0 =I 
I 0 


where each identity 7 € M,/2. 


If Ae M, is nonsingular but not triangular, notice that it is almost as 
convenient to solve Ax =b if A is given in factored form as 


A=LU 


in which L is lower triangular and U is upper triangular. 


Further Readings. The rational forms mentioned in this section are quite 
classical and can be found in more detail in [HKu]. The real Jordan 
canonical form seems also to have been known to matrix theorists for 
some time, but presentations of it are not common. A statement of the 
real Jordan form can be found in [Kow], for example. See [New] for a 
discussion of canonical forms of matrices with rational or integer entries. 
See [Gan], vol. 2; [Gant]; and [HJ] for additional details on special 
canonical forms. 


Exercise. If A = LU, as above, is nonsingular, show that L and U 
must both be nonsingular and must therefore have nonzero diagonal 
entries. 


In order to solve Ax = b, we may first solve 


Ly=b by forward substitution 
and then 
Ux=y by backward substitution 


and the necessary computational effort is only twice as great as in the 
simple triangular case. Thus, factorizations such as LU can be helpful in 
solving linear systems if the cost of achieving them is not too great. They 
are also appropriate to mention here, as they are special forms into which 
a matrix can be put - motivated now, not by eigenvalues, but by linear 
systems. 


3.5 Triangular factorizations 


If a linear system Ax = b has a nonsingular triangular (0.9.3) coefficient 
matrix A € M,,, computation of the unique solution x is remarkably easy. 
If, for example, A is upper triangular, 


3.5.1 Lemma. Suppose that A e M, can be written 
A=LU 


160 Canonical forms 3.5 Triangular factorizations 161 


Each time there is one equation in one unknown to be solved. This equa- 
tion will be solvable since each /,; is nonzero [because det L({1,...,7}) x 
det U({1,...,7}) =det A({1, ...,/}) by (3.5.1)]. This completes the factor- 
ization of A({1,..., k}). 

Partition A as in (3.5.1). Since rank A=k=rank Aj, we see that 
the rows of [Az Az] are unique linear combinations of the rows of 
[Ay Anl, that is, 


Az, = BA and Az = BA) 


for some uniquely determined Be M, k,k- Now partition the desired L 
and U, also as in (3.5.1), noting that nonsingular Ly, and U;, have been 
determined. We may then use (3.5.1) to solve for 


Uy=Ly'Ai and Ly; =A>,U,! 


with LeM,, lower triangular and Ue M, upper triangular. For any 
partition 


A= Ay, Ar l L= Ly, 0 l u=% u 
Az, Az Ly, La 0 Ur 
with A11, L11, Une My, kan, we have 
LyUy,=Ayy 
Ly, Un= An, La Un = Az 
and 
La Uj. + La2U22 = Az 


In particular, the upper left blocks of L and U must form a factorization, 


of the same type, of the corresponding block of A. th 
en 


Az = Ly, Un + L2 Uz = Any UT LEA + Laz U9 = BA An An + Ly) U2 
= Ap. + La U2 


Exercise. Verify (3.5.1) by carrying out the partitioned multiplication. 


3.5.2 Theorem. Suppose that A e M, and that rank A=k. If 
det A({l,..., /}) #0, J=1,...,k 

then A may be factored as 
A=LU 


with L e M, lower triangular and Ue M, upper triangular. Furthermore, 
the factorization may be chosen so that either L or U is nonsingular; both 
L and U may be chosen nonsingular if and only if k=n, that is, if and 
only if A is nonsingular. 


To complete the factorization, it is necessary and sufficient that 
L2U22 =0 


We may, for example, choose L3; (respectively U 7) to be any nonsingular 
lower (respectively upper) triangular matrix in M,,-« we like and choose 
Un (respectively L22) to be 0. Since L; and U; are nonsingular, either L 
or U may be chosen to be nonsingular. If k=n, L = Lı and U = Up will 
be nonsingular; if k <n, not both L and U can be nonsingular because A 
is singular. This completes the proof. O 


Proof: We first show that, under the assumption on leading minors, 
A(f{l,...,k}) may be factored as L({1, ..., KD U({1,..., k}), with both non- 
singular. It is possible to solve for the relevant entries of L and U, one 
by one. Let L=[/,;] and U=[u,;]. Set up = 1, and let li = aa, i=1,...,k. 
Solve for 


3.5.3 Example. Not every matrix has an LU factorization. Consider 


If A could be written as 


hy 0 0 | 
A=LU= 
p hy || 0 un 


hiu =0 would require that one of L or U be singular, but LU =A is 
nonsingular. 


Continue. Set v2. = 1 and let l2 =a; — latu, i=2,...,k. Solve for 


ay; — ly U4; . 
af J _ 
Uj = , J=3,...,k 


hz 


Exercise. Show that a nonsingular matrix that has an upper left k-by-k 
singular principal submatrix cannot have an LU factorization. 


Continue, letting successive diagonal entries of U be 1 and then solving 
for the next column of L({1,...,&}) and then the next row of U({1,..., K) 


162 Canonical forms 


3.5.4 Example. It is possible for A € M, to have an LU-factorization 
without satisfying the principal minor conditions of (3.5.2). For example, 


0 0] [0 0 1 1 
f 2} |1 14/0 1 
has rank 1, but its 1,1 entry is 0. 


Exercise. The LU factorization in (3.5.4) is not unique, even if the diag- 
onal entries of U are required to be 1. Give several other factorizations of 


[3] 

It should now be clear that an LU factorization of a given matrix can 
be highly nonunique and may or may not exist. Much of the trouble, 
however, arises from singularity, either in A or in its leading principal 
submatrices. Using the tools of (3.5.1) and (3.5.2), however, we can give 
a full description in the nonsingular case, and we can impose a normali- 
zation that makes the factorization unique (canonical). 


3.5.5 Corollary. Suppose that A € M,, is nonsingular. Then A may be 
written as 


A=LU 
with L e M,, lower triangular and Ue M, upper triangular, if and only if 
det A({l,..., /}) #9, 


Furthermore, L and U are nonsingular and the factorization is essentially 
unique. The matrix A may be written as 


A=L'DU'’ 


in which Z’ (respectively U’) € Mp is lower (respectively upper) trian- 
gular with all diagonal entries equal to 1, and Dis a nonsingular diagonal 
matrix determined by 


det D({l,..., /}) =det A({1, .-.. J3), 
The factors L’, U’, and D are uniquely determined by A. 


j=i,...,n 


j=l,- A 


Exercise. Provide details of a proof of (3.5.5) using (3.5.1), (3.5.2), and | 


prior exercises. 


Returning to the solution of the linear system 


Ax =b 


3.5 Triangular factorizations 163 


suppose that A €e M, cannot be factored as LU, but can be factored as 
PLU, in which P e M, is a permutation matrix (0.9.5) and L and U are 
lower and upper triangular, as before. This amounts to a reordering of 
the equations prior to factorization. In this event, solution of Ax =b is 
still quite simple via 


Ly=P'b and Ux=y 


It is worth realizing that any nonsingular A € M, may be so factored, 
and any Ae M, may be factored as PLUQ, in which Qe M, is also a 
permutation. 


3.5.6 Lemma. Let A e Mg be nonsingular. Then there is a permutation 
matrix P e M, such that 


det(P7A)({1,...,/})#0, j=l,...,k 


Note that PTA is just a reordering of the rows of A. 


Proof: The demonstration is by induction on k. If k =1 or 2, the result is 
clear by inspection; suppose that it is valid up to and including k—1. 
Consider a nonsingular A € M; and delete its last column. The remaining 
k—1 columns are linearly independent and hence contain k—1 linearly 
independent rows. Permute these rows to the first k—1 positions and 
apply the induction hypothesis to the nonsingular upper (k —1)-by-(k—1) 
submatrix. This determines a desired overall permutation. Since PTA is 
nonsingular, the proof is complete. O 


3.5.7 Theorem. Let A e M,. There exist permutation matrices P, Qe 
Ma, a lower triangular matrix Le Mp, and an upper triangular matrix 
Ue™M, such that 


A=PLUQ 
If A is nonsingular, one may take Q =I and A may be written as 


A=PLU 


Proof: If rank A=k, A has a k-by-k nonsingular submatrix (0.4.4d), 
which may, by permutation of rows and columns, be permuted into the 
upper left corner. Now apply (3.5.6) to the upper left corner and apply 
(3.5.2) to achieve a factorization of the first type. If A is nonsingular, 
(3.5.6) indicates that permutation on the right is unnecessary in order to 
apply (3.5.2), which verifies the second factorization and completes the 
proof. QO 


164 Canonical forms 3.5 Triangular factorizations 165 


operations to attain row-reduced echelon form and then use elementary 
column operations on the result. (c) Show that two matrices in M,, , 
are equivalent if and only if they have the same rank. (d) Suppose that 
AeéM,,,,, is equivalent to the special form indicated in (b), S E [T= A. 
Develop the solution theory for the linear system Ax =b in terms of 
equivalence. 


Problems 


1. The theory developed in this section is in terms of LU, with L lower 
triangular and U upper triangular. Show that a parallel theory of UL fac- 
torization may be developed, but that the factors will, in general, be 
different. 


2. Recall from Problem 3 of Section (2.6) that the QR factorization 
(2.6.1) of an arbitrary A e M, may be achieved efficiently by n —1 House- 
holder transformations. Here, Q is unitary and R is upper triangular. 
Describe how Ax = b may be solved if A is factored in OR form. 


Further Reading. Problem 5 above is adapted from [Ste], where addi- 
tional information about the numerical application of LU factorizations 
may be found. 


3. Show that A e M, may be written as 
A=LP U 


in which Le M, is a nonsingular lower triangular matrix, Ue M,, is a 
nonsingular upper triangular matrix, and Po is a sub-permutation matrix 
[a permutation matrix with as many of the 1’s replaced by 0’s as the rank 
of A is less than n]. Hint: Use elementary row and column operations 


4. If the leading principal minors of A e M, are all nonzero, describe 
how A may be LU-factored using type 3 elementary row operations to 
zero out entries below the diagonal. 


5. (Lanczos tridiagonalization algorithm.) Let Ae M, and xeC” be 
given. Define X =[x Ax A’x... A"~'x]. The columns of X are said to 
form a Krylov sequence. Assume that X is nonsingular. (a) Show that 
X~'AX is a companion matrix (3.3.12) for the characteristic polynomial 
of A. (b) If Re M, is any given nonsingular upper triangular matrix and 
S= XR, show that S~'AS is in upper Hessenberg form. (c) Let yeC" 
and define Y=[y A*y (A*)’y...(A*)”~'y]. Suppose that Y is nonsingular 
and that Y*X can be written as LDU, in which L is lower triangular and 
U is upper triangular and nonsingular, and D is diagonal and nonsin- 
gular. Show that there exist nonsingular upper triangular matrices R and 
T such that (X¥R)~'!=T*Y* and such that T*Y*AXR is tridiagonal and 
similar to A. (d) If Ae M, is Hermitian, use the above ideas to specify an 
algorithm to produce a tridiagonal Hermitian matrix that is similar to A. 


6. Two matrices A, Be M,, , are said to be equivalent if there are non- 
singular matrices Se M,, and Te M, such that 


B=SAT 


(a) Show that this notion of equivalence is an equivalence relation on 


M m,n- (b) Show that every matrix A € M,,,,, is equivalent to a matrix of 


the form F o] EMm ns TE Mpg, kK<min{m, n}. Hint: use elementary row 


CHAPTER 4 


Hermitian and symmetric matrices 


4.0 Introduction 


4.0.1 Example. If f: D — R is a twice continuously differentiable func- 
tion on some domain DC R”, the real matrix 


a? f(x) 


A(x) = [hij(x)] = E ax, 
i IAJ 


l EM, 

is known as the Hessian of f. It is a function of x and plays an important 
role in the theory of optimization because it can be used to determine if a 
critical point is a relative maximum or minimum [see (7.0)]. 

For our purposes here, the only property of H = H(x) that interests 
us follows from the important fact that the mixed partials are equal; 
that is, 

af af 


Ox; Ox; = Ox; OX; 


for all i, j/=1,2,...,n 


In terms of the Hessian matrix H = [h;;], this means that h;; = hj; for 
all i, 7=1,2,...,; that is, H=H". A matrix Ae M, such that A= 
A’ is said to be symmetric. Thus, the Hessian matrix of a real-valued 
twice continuously differentiable function is always a real symmetric 
matrix. 


4.0.2 Example. As a second example, let A=[a;;]eM, be a given 
matrix with real or complex entries, and consider the quadratic form on 
R” or C” generated by A: 


167 


168 Hermitian and symmetric matrices 


Q(x) =x"Ax = © axx 
i j= 


n 
I 
= © 3(ai,+a,i) xix; 
i j=l 


=x T[}(A+4")]x 


Thus, A and i(A +A’) both generate the same quadratic form, and the 
latter matrix is symmetric. To study real or complex quadratic forms, 
therefore, it suffices to study only those forms generated by symmetric 
matrices. Real quadratic forms arise naturally in physics, for example, as 
an expression for the inertia of a physical body. 


4.0.3 Example. As a third example, consider a second-order linear 
partial differential operator L defined by 


3? 
Lee $ ajo) 2E (4.0.4) 


ij=l Ox; Ox; 
The coefficients a;;(x) and the function f(x) are assumed to be defined 
on the same domain DC R”, and f should be twice continuously differen- 
tiable on D. The operator L is associated in a natural way with a matrix. 
The matrix A =[a;;(x)] need not be symmetric, but since the mixed par- 
tial derivatives of f are equal, we have 


2 
4 arf nd af af | 
= jj = X zla) tai) 
LS a u0) OX; OX; 2, Al iA ax, ax; “Y OX; AX; 
n 1 3f 
= Par z lay (x) + ayi()] dx Bx, 


Thus, the symmetric matrix 4(A +A’) yields the same operator Las the ; 
matrix A, and for the study of real or complex linear partial differential : 
operators of the form (4.0.4) it suffices to consider only symmetric coeffi- 7 


cient matrices. 


4.0.5 Example. Consider an undirected graph I; that is, I consists of - 
P,,} and a collection E of unor- ; 


{Pi Pi}, [Pa Pj,},-.}. The 


a collection N of “nodes” {P;, Po,...; 
dered pairs of nodes called “edges,” E = i 
graph T can be described very succinctly by its so-called adjacency matri 


A =[a,;]. Here, 


ji if [P PREE 
fij = 0 otherwise 


: . a : T 
Since I is undirected, A is a real symmetric matrix; that is, A’ =A. 


4.1 Definitions, properties, and characterizations 169 


4.0.6 Example. Let A= 
real bilinear form 


[a;;]e€M,, be a real matrix and consider the 


ylac= X dij YiX;, 


i,j=l 


Q(x, y)= x, yER" (4.0.7) 
which reduces to the ordinary inner product when A =J. If we want to 
have Q(x, y) = Q(y, x) for all x, y, then it is necessary and sufficient that 
a,;=aj;; for all i, 7=1,...,2. To show this, it suffices to observe that if 
x=e;and y=e;, then O(e;, e;) =a;; and Q(e,, e;) =a;;. Thus, symmetric 
real bilinear forms are naturally associated with symmetric real matrices. 
Now let A = [a;;] € M, be a real or complex matrix, and consider the 
complex form 
x, yec” 


n 
=y*Ax= X djJiXj, (4.0.8) 


i,j=l 


A(x, y) 


which, like (4.0.7), reduces to the ordinary inner product when A = /. This 
form is no longer bilinear but is linear in the first variable and “conjugate 
linear” in the second variable (H(ax, by) = abH(x, y)) just like the com- 

plex Euclidean inner product. Such forms forms are sometimes called sesqui- 
linear. If we want to have H(x, y) = H(y, x) (y, x) like the inner product, then 
the same argument as in the previous case shows that it is necessary and 
sufficient to have a;; =ā;;; that is, A= A’ = A*. Notice that if A is real, 
then A* =A’, 

The class of matrices A € M, such that A = A* is in many respects the 
natural generalization to M,(C) of the class of real symmetric matrices. 
Such matrices are called Hermitian; notice that a real Hermitian matrix 
is a real symmetric matrix. The class of complex nonreal symmetric 
matrices fails to have many important properties of the class of real sym- 
metric matrices. In this chapter we shall study complex Hermitian and 
symmetric matrices and will indicate by specialization what happens in 
the real symmetric case. 


4.1 Definitions, properties, and characterizations of Hermitian 
matrices 


4.1.1 Definition. A matrix A=[a,;]eM, is said to be Hermitian if 
A=A*, where A*= A’ = [ā;]. It is skew-Hermitian if A= —A’*. 
Some observations for A, Be M}: 


1. A+A*, AA*, and A*A are all Hermitian for all A Ee Ma. 
2. If Ais Hermitian, then A’ is Hermitian for all k=1,2,3,....1f A 
is nonsingular as well, then A~! is Hermitian. 


170 Hermitian and symmetric matrices 


3. If A,B are Hermitian, then aA+bB is Hermitian for all real 
scalars a, b. 

4. A—A* is skew-Hermitian for all A € Mp. 

5. If A, Bare skew-Hermitian, then aA+bB is skew-Hermitian for 
all real scalars a, b. 

6. If Ais Hermitian, then iA is skew-Hermitian. 

7. Jf Ais skew-Hermitian, then iA is Hermitian. 

8. Any AeM,, can be written as 


A=1(A+A*)+3(A-A*) = H(A) + S(A) 


where H(A) = i(A 4 A*) is the Hermitian part of A, and S(A)= 
i (A—A*) is the skew-Hermitian part of A. 

9. If Ais Hermitian, the main diagonal entries of A are all real. In 
order to specify the n? elements of A one may specify freely any 
nreal numbers (for the main diagonal entries) and any in(n —1) 
complex numbers (for the off-diagonal entries). 


4.1.2 Theorem. Each A eM, can be written uniquely as A =S+iT, 
where both S and T are Hermitian. It can also be written uniquely as 
A=B+C, where B is Hermitian and C is skew-Hermitian. 


Proof: Write A =4(A+A*)+i[(—i/2)(A—A*)] and observe that both 
S=3 (A+A*) and T=(—i/2)(A — A*) are Hermitian. For the uniqueness 
assertion, observe that if A=E+iF with both E and F Hermitian, then 


IS =A+A*=(E+iF)+(E+iF)*=Et+iF + E*—iF* =2E 


so E = S. Similarly, one shows that F = T. The assertions about the repre- 


sentation A = B+C are proved in the same way. B 


The foregoing observations suggest that if one thinks of M, being 
analogous to the complex numbers, then the Hermitian matrices are 
analogous to the real numbers. The analog of the operation of complex 
conjugation in C is the * operation (adjoint) on M,. A real number is a 
complex number z such that z=Z;a Hermitian matrix is a matrix A € M, 
such that A=A*. Just as every complex number z can be written as 
z=s+tit with s, t € R, every complex matrix A can be written uniquely as 
Az=S+iT with S and T Hermitian. There are some further properties | 


that strengthen this analogy. 


4.1.3 Theorem. Let A €e M, be Hermitian. Then 


(a) x*Ax is real for all xe C”; 
(b) All the eigenvalues of A are real; and 
(c) S*AS is Hermitian for all SeM,. 


4.1 Definitions, properties, and characterizations 171 


Proof: One computes (x*Ax) = (x*Ax)* = x*A*x = x*Ax, so x*Ax equals 
its complex conjugate and hence is real. If Ax =dx and x*x=1, then 
h=hxtx =xX*AX = x*Ax is real by (a). Finally, (S*AS)* = S*A*S = S*AS, 
so S*AS is always Hermitian. 


Exercise. What does each of the foregoing properties of a Hermitian 
matrix A e M, mean when n= 1? 


Each of the properties in (4.1.3) is actually (almost) a characterization 
of Hermitian matrices. 


4.1.4 Theorem. Let A = [a;;] € M, be given. Then A is Hermitian if 
and only if at least one of the following holds: 


(a) x*Ax is real for all xe C”; 
(b) Ais normal and all the eigenvalues of A are real; or 
(c) S*AS is Hermitian for all Se M,,. 


Proof: It suffices to prove only the sufficiency of each condition. If x*Ax is 
real for all xe C”, then (x+y)*A(x+y) = (x*Axt y*Ay)+ (x*Ayt y*Ax) 
is real for all x, y e C”. Since x*Ax + y*Ay is real by assumption, we con- 
clude that x*Ay+y*Ax is real for all x, y e C”. If we choose x =e, and 
y =e;, this says that ax; +a@jx is real, so Im ag; = —Imajx. If we choose x = 
ie, and y= ej, this says that —ia,;+iajx is real, so Re ay; = Re aj,. These 
two identities together are equivalent to having a,x; =@,, and since j, k are 
arbitrary we conclude that A = A*. 

If A is normal, it is unitarily diagonalizable, so A= UAU* with A= 
diag (Ai, A2,---, An), a diagonal matrix formed from the eigenvalues of A. 
In general, we have A* = UAU*, but if A is real, we have A* = UAU*=A. 
The last condition implies that A is Hermitian by choosing S=/. O 


Since a Hermitian matrix is obviously normal (AA* = A? = A*A), all 
the results about normal matrices in Chapter 2 apply. For example, 
eigenvectors corresponding to distinct eigenvalues are orthogonal; there is 
a complete set of orthonormal eigenvectors; Hermitian matrices are 
unitarily diagonalizable; and so forth. 

For reference, we formally state the following important result. 


4.1.5 Theorem (the spectral theorem for Hermitian matrices). Let A € 
M, be given. Then A is Hermitian if and only if there is a unitary matrix 
UeM, and a real diagonal matrix A € M, such that A= UAU*. More- 
over, A is real and Hermitian (i.e., real symmetric) if and only if there is 
a real orthogonal matrix Pe M, and a real diagonal matrix A € M, such 
that A= PAP”. 


172 Hermitian and symmetric matrices 


Although a real linear combination of Hermitian matrices is always 
Hermitian, a complex linear combination need not be. For example, if A 
is Hermitian, ¿A is Hermitian only if A = 0. Furthermore, if A and B are 
Hermitian, then (AB)* = B*A*= BA, so AB is Hermitian if and only if A 
and B commute. 

One of the most famous results about commuting Hermitian matrices 
(because of an important generalization to operators in quantum me- 
chanics) is the following special case of Theorem (2.5.5). 


4.1.6 Theorem. Let 5 be a given family of Hermitian matrices. There 
exists a unitary matrix U such that VAU* is diagonal for all A e § if and 
only if AB = BA for all A, BeF. 


A Hermitian matrix A has the property that A is equal to A*. One way 
to generalize the notion of a Hermitian matrix is to consider the class of 
matrices such that A is similar to A*. The following theorem charac- 
terizes this class in several ways, the first of which says that such matrices 
must be similar, but not necessarily unitarily similar, to a real, but not 
necessarily diagonal, matrix. 


4.1.7 Theorem. Let AEM, be given. The following statements are 
equivalent: 


(a) Ais similar to a matrix Be M,(R); 

(b) A is similar to A’*; 

(c) Ais similar to A* via a Hermitian similarity transformation; 

(d) A=HK, in which H,K eM, are Hermitian with at least one 
nonsingular; and 

(e) A= HK, in which H, K e M, are Hermitian. 


Proof: First note that (a) and (b) are equivalent: if (a) holds, then 
S'AS=B=T"'B'T=T™'B*T=T~'S*A*(S~')*T, which means that 
A* = (ST ~'S*)~'A(ST~'S*) or that (b) holds. If (b) holds, then A and 
A* have the same Jordan canonical form. Since A and A’ are similar for 
any matrix, this means that if J is the Jordan matrix of A, then J must be 
similar to J. Consequently, for each Jordan block J,(X) in J there is a 
corresponding Jordan block J} (A) (of the same size) in J. If \ is real, this 
gives no information, but if \ is not real, it means that the Jordan blocks 
of A corresponding to each nonreal eigenvalue and its conjugate must 
occur in matching pairs. Using the argument leading to (3.4.5), we con- 
clude that J must be similar to a direct sum of real matrices of the form 
(3.4.4), and hence (a) follows. 


4.1 Definitions, properties, and characterizations 173 


To verify that (b) implies (c), suppose that S~!'AS = A* and observe 
that T-'!AT=A* if T=aS for any nonzero a= re” ec. Thus, AT = TA* 
or, equivalently, AT* = T*A*. Adding these two identities produces the 
identity A(7+7*)=(T+T7*)A*, and if T+T* were nonsingular, this 
would mean that A is similar to A* via the Hermitian matrix T+ T*. But 
a may be chosen so as to make 7+ T* nonsingular, since T+ T* is non- 
singular if and only if T7~'(7+7*) =/+T 7 'T* is, if and only if —1¢ 
o(T~'T*). However, T~'T* =e~*!"S~'S*, and since œ may be chosen to 
produce any @€[0, 27), we need only pick 0 so that —e7/"¢ o(S7'S*). 
Thus, (b) implies (c). 

Next suppose that (c) holds and write RT'AR = A* with Re M, non- 
singular and Hermitian. Then R'A = A*R™!' and A=R(A*R~'). But 
(A*R7')*= R7'A = A*R™, so that A is the product of the two Hermitian 


“matrices R and A*R™!, of which R is nonsingular, and (d) holds. 


If (d) holds and A= HK with H nonsingular, then HT'AH = KH = 
(HK)* = A*, and (b) holds. The argument is similar if K is nonsingular. 
Obviously (d) implies (e); we shall show that (e) implies (a). If A= 
HK with H and K Hermitian and both singular, consider U*AU= 
(U*HU)(U*KU), where Ve M, is unitary and diagonalizes H in the form 


0 0 


for a nonsingular diagonal matrix De Mg, k <n. Partition the matrix 
U*KU conformally with H’ so that 


U*AU = H'(U*KU) = [o o $ | 


0 O|| * œ 
_[DK * 
=o o] 


The term DK’e Mpx is the product of two Hermitian matrices, one of 
which is nonsingular, so by the equivalence of (d) and (a) it is similar to 
areal matrix Be M,. Denote the Jordan canonical form of B by J eM, 
so that A is similar to a matrix C of the form 


J * 
C= 
0 0 
The matrix C is upper triangular, and its eigenvalues are the eigenvalues 
of J together with n—k additional zero eigenvalues. The Jordan block 
structure of the Jordan canonical form of C must be the same as that of 


J for the blocks associated with any nonzero eigenvalues because if 
\#0, the (column) rank of each power (C—X/)’ is evidently equal to 


uu =| 4 5 |=" 


174 Hermitian and symmetric matrices 


n—k+rank(J— J)’, r=1,2,...,”. In particular, the Jordan blocks of 
C associated with any nonreal eigenvalues must occur in matching con- 
jugate pairs, and hence the Jordan canonical form of C is similar to a 
matrix of the form (3.4.6), which is real. O 


Problems 


1. Show that every principal submatrix of a Hermitian matrix is Her- 
mitian. Does this property hold for skew-Hermitian matrices? for normal 
matrices? 


2. If Ae M, is Hermitian and SeM,, show that SAS* is Hermitian. 
What about SAS! (if S is nonsingular)? 


3. Let A, Be M, be Hermitian. Show that A and B are similar if and 
only if they are unitarily similar. Hint: If A = SBS~', show that A= 
UAU* and B= VAV* with U and V unitary, so U*AU = A = V*BV. 


4. Verify the properties 1-9 following (4.1.1). 


5. Sometimes one can show that a matrix has only real eigenvalues by 
showing that it is similar to a Hermitian matrix. A classic example of this 
is the following: Let A = [a;;] € M,(R) be tridiagonal; that is, aj; =0 if 
|i—j|>1. Suppose that the entries have the very weak symmetry prop- 
erty that a; ;414/41,;>0 for all i=1,2,...,2—1. Show that there is a real 
diagonal matrix D with positive diagonal entries such that DAD ~' is sym- 


metric, and conclude that A has only real eigenvalues. Consider [3 a] 


and explain why the assumption on the signs of the off-diagonal entries is 
necessary. Use a limit argument to show that the conclusion that the 
eigenvalues are real continues to hold if a; ;418;+1,;=0. 


6. Show that every matrix Ae M, is uniquely determined by the Her- 
mitian form x*Ax that it generates, in the following sense: If A = [a,j], 
B=([b;]€M, are given, show that x*Ax =x*Bx for all xe C” if and 
only if A =B. Hint: If x*Ax =0 for all xe C”, consider (x + y)"A(x+ y) 
and show that x*Ay+ y*Ax =0 for all x, ye C”. Choose x = ex, y= e'e; 
0 eR to show that aje?” = —aj, for all 0e R and all j, k=1,2,..., n. 


7. Show that a matrix Ae M, is not uniquely determined by the qua- 
dratic form x’Ax that it generates if nz 2; that is, if n=2 there are 
A,BeM, with A#B such that x’Ax=x'Bx for all xe C”. Hint: If 
C= —C7, what is x’Cx? 


8. Show that a matrix A € M, is not determined by the absolute value of 


the Hermitian form |x*Ax| that it generates. Hint: Let A= [ o i] and 
show that |x*Ax]| = |x*ATx] for all xe C°. 


4.1 Definitions, properties, and characterizations 175 


9. Show that a matrix A e M, is almost determined by the absolute value 
of the Hermitian bilinear form that it generates, in the following sense: If 
A, Be M, are given, show that |x*Ay| = |x*By/ for all x, ye C” if and only 
if A=e"B for some 0ER. Hint: Let A=[a;;] and B=[b;;]. Use x=e; 
and y =e; to show |q;;| = |b;;| for all i, j=1,..., n. Let x=e;, y=se;+ tek 
to show |Sajj+ taik) = |sbjj+ tbix|* and hence Re(sf[a;; 4 — bi; birl) =0 
for all s,te C. Deduce that a;j/bi; = Aix | Diz if bi; Dix #0. 

10. Show that A e M, is Hermitian if and only if iA is skew-Hermitian. 
Deduce that the eigenvalues of a skew-Hermitian matrix are all pure 
imaginary and that the eigenvalues of the square of a skew-Hermitian 
matrix are real and nonpositive. 


11. If A,BeM, are Hermitian, show that tr(AB)* <tr A’B*. Hint: 
Show that AB— BA is skew-Hermitian and consider tr (4B — BA)’. 


12. If AeM,, is Hermitian, show that the rank of A is equal to the 
number of nonzero eigenvalues of A, but that this is not generally true 


for non-Hermitian matrices. Hint: lo ol: 


13. If de M, is Hermitian and A #0, show that 

[tr A]? 

tr A? 

with equality if and only if there is a matrix U=[u,--- u,] € Mn, with 
orthonormal columns and some ae R such that A =aUU*; that is, A is 


a real scalar multiple of a unitary projection. Hint: If M, ..., A, are the 
nonzero eigenvalues of A, the Cauchy-Schwarz inequality says that 


rank(A) = 


r 2 r 
(we AP =( S N) <r N=rtrA? 
i=l 


i=l 
with equality if and only if all 4; are equal. 

14. A skew-Hermitian matrix A € M, satisfies the identity A = —A*. If 
6eR, show that A =e" A* if and only if e72 A is Hermitian. What 
is this for 0=7x? For 0 =0? Explain why the class of skew-Hermitian 


matrices may be thought of as one of infinitely many classes of “gen- 
eralized Hermitian” matrices, and describe the structure of each such class. 


15, Let the Hermitian matrix A = [a;;]€ M, be written in partitioned 
form as 


dil x* 
A= ~ 
Ba] 


where xe C"~! and Ae M,,_,. Show that 
det A =a), det A—x*(adj A)x 


176 Hermitian and symmetric matrices 4.2 Variational characterizations of eigenvalues 177 


where adj A is the classical adjoint of A [see (0.8.2)]. What weaker hy- 
pothesis on A is sufficient for this formula to hold? Hint: Use the Laplace 
cofactor expansion (0.3.1) for the determinant down the first colum of A 
and then across the first row of the cofactors obtained. 


Because U is unitary, 


(UP = S a= 


i= 
and hence we have shown that 


. [ve Bye . * < * < , Fy me * 2. 
4.2 Variational characterizations of eigenvalues of Hermitian N1X"X = Amin X*X S X*AX S Nmax X*X = Ay X*X (4.2.3) 


matrices These inequalities are sharp, for if x is an eigenvector of A corresponding 


to the eigenvalue \;, then x*Ax =x*)h; x =),x*x, and similarly for ),. 
The remaining assertions follow easily from (4.2.3). If x #0, we have 
x*Ax 


x*x 


For a general matrix A € M,, about the only characterization of the eigen- 
values is the fact that they are the roots of the characteristic equation 
pa(t) =0. For Hermitian matrices, however, the eigenvalues can be char- 
acterized as the solutions of a series of optimization problems. 

Since the eigenvalues of a Hermitian matrix A € M, are real, we shall 
adopt the convention that they are labeled according to increasing (non- 


decreasing) size: 


cont ENT 


with equality when x is a \, eigenvector of A, so 


*Ax 
max ——~ =}, (4.2.4) 


Amin = Aq EMS SAn- S An = Amax (4.2.1) Finally, if x ~0, we have 


The smallest and largest eigenvalues are easily characterized as the solu 
tions to a constrained minimum and maximum problem. The characteri- 
zation theorem bears the names of two British physicists, and the expres- 
sion x*Ax/x*x is known as a Rayleigh-Ritz ratio. 


w (FA ga) (Gee) (Gee)! 
x*x Jx*x V x*x Vx*x ] \ VXXX 
so the condition (4.2.4) is equivalent to the condition 

max x*Ax =, (4.2.5) 


i x*x =] 
4.2.2 Theorem (Rayleigh-Ritz). Let A e M„ be Hermitian, and let the 
eigenvalues of A be ordered as in (4.2.1). Then 


The arguments for \, are similar. © 


The geometrical interpretation of (4.2.5) is that \, is the largest value 
of the function x*Ax as x ranges over the unit sphere in C”, a compact 
set. The bounds (4.2.3) give the following eigenvalue inclusion result. 


\yx*x Sx*Ax Sd, x*x forall xec” 


xX*Ax 
A max = A, = Max —~— = max x*Ax 
x¥0 XX x*x=1 


4.2.6 Corollary. Let Ae M, be a given Hermitian matrix, let xe C” be 
a given nonzero vector, and let w= x*Ax/x*x. Then there is at least one 
eigenvalue of A in the interval (—0o, a] and at least one in [a, œ). 


. X*Ax , 
Amin = M = min = min x*Ax 
x40 XX xty et 


Proof: Since A is Hermitian, there is a unitary matrix Ue M, such tha 
A=UAU* with A =diag (ħi, \2,.--,,). For any x e C” we have 


X*Ax = x*UAU*x = (U*x)*A(U*x) 


Exercise. Prove Corollary (4.2.6). 


Exercise. What are the analog and the geometrical interpretation of 
(4.2.5) for \,? 


The Rayleigh-Ritz theorem provides a variational characterization 
of the largest and smallest eigenvalues of a Hermitian matrix A, but 
- what about the rest of the eigenvalues? Suppose A = UAU* with U= 
[uy U2... Uy]; the columns of U are the orthonormal eigenvectors of A. If 
~ we consider only those vectors xe C” that are orthogonal to u, we have 
_ the following modification in the main identity of (4.2.2): 


n 
=F X |(U*x);|? 
j=l 
Since each term |(U*x);|? is nonnegative, we have 
n n n 
Amin D (U; <x*Ax = E NUO S Ama D (Oil? 
j i=l 


i=] i=l 


178 Hermitian and symmetric matrices 


xtax= > NIU = E dilate? = $ lata 
is = = 
This is a nonnegative linear combination of M, Ags -+s Ans and hence 
wear $ nlui P =m 3 xP = de D UREN 
i= i= = 
provided that x is orthogonal to the first column of U. This inequality 
becomes an equality if we choose x = u2, sO we have a characterization 


_ X* , 
min = min x*Ax=M (4.2.7) 
x0 XxX x*x=| 
xiu xiu 


of the second smallest eigenvalue. 

Exercise. Extend this argument to show that 
, x*Ax , 

min = min 


* 
x#0 X X x*x=l 
XL yy Ug, Hk] XL My, Uzes bg] 


x*Ax =z, K=2,3,...,0 (4.2.8) 


Exercise. Show that 


x*Ax 
max ; = max 
x#0 XX x*x=l 
X LU yg Uy pre Un kel 


k=1,2,....n—-1 


x* AX =An—ks 


XL Uy Uy pre Up —k +) 


(4.2.9) 


Unfortunately, these formulae are of little practical value because they 
require explicit knowledge of some of the eigenvectors, about which we 
usually have no information. But (4.2.7) and the related formulae (4.2.8) 
and (4.2.9) can be the starting point for developing a useful characteri- 
zation. Let we C” be a given vector. Then 


n 
sup x*Ax = sup x*UAU*x = sup 5 Nil (U*x);|? 


x*x=l x*x= 1 x*x=} j=l 
xLw XLW xlw 
4 2 . 2 
= sup $ Aj,|z|\"= sup > Alz] 
gz=il i=l z’z=1 j=l 
x=Uziw ziU*w 
< 2 
> sup S lz] 
z*ćz=l i=l 
ziU*w 
y= 22 = 2-2 = 0 
2 
= sup — Anaia- +An|Znl = An- (4.2.10) 
Zn —1l2+ iza? =! 


ziU*w 


4.2 Variational characterizations of eigenvalues 179 


In the second line of this argument we set z = U*x and used the fact that 
U is unitary to conclude that z*z = 1 if x*x =1. The first inequality in the 
last line comes from the fact that if one restricts the set over which a 
supremum is taken, the value of the supremum cannot increase. The final 
inequality follows from the ordering \,,2,—; and an often used prop- 
erty of convex combinations. 

In the foregoing argument, the vector w was fixed but arbitrary, and 
hence we may take the infimum of (4.2.10) over all w to obtain 

inf sup x*Ax2=d,_1 


we!" x*x=1 
xilw 


We have seen, however, that equality holds in (4.2.10) if w=u,,, so we have 


inf sup x*Ax=),_1 
weC" x*x=1 
XLW 

a characterization that is somewhat more complicated in form than 
(4.2.7) but does not involve knowledge of any of the eigenvectors of A. 
Since x*Ax = Ay; for x=u,—1, one often sees this formula with “max” 
instead of “sup” and with “min” instead of “inf.” This is the basic idea 
behind the following Courant-Fischer “min-max theorem”. 


4.2.11 Theorem (Courant-Fischer). Let A €e M, be a Hermitian matrix 
with eigenvalues M £ M £ + S),, and let k be a given integer with 1 < 
ksn. Then 

x*Ax 


min max z =k (4.2.12) 
Wir Was eers Wy KEC” x#0, xec” X X 
XLW Wo. Wyk 
and 
. x*Ax 
max min wo = Mk (4.2.13) 
Wy Way ener We EC” x#0,xeC" XX 


XL Wy, Was es Wk] 


Remark: If k =n in (4.2.12) or k =1 in (4.2.13), we agree to omit the 
outer optimization, as the set over which the optimization takes place is 
empty. In these two cases the assertions reduce to the Rayleigh-Ritz 
theorem (4.2.2). 


Proof: We consider only (4.2.12), as the argument for (4.2.13) is similar. 
Write A = UAU* with U unitary and A=diag(\j,\2,.-.,A,), and let 
l<ksn.Ifx#0, 

x*Ax _ (U*x)*A(U*x) _ (U*x)*A(U*x) 

x*x x*x ~  (U*x)*(U*x) 


180 Hermitian and symmetric matrices 
and {U*x: xe C” and x #0) = {y e C”: y #0}. Thus, if w1, W2, ..-, Wn-kE 
C” are given, we have 
xX*AX sup y*Ay 
sup = 
x¥0 x*Xx y#0 y*y 
XLW,- Wye K ypLlUtwy,..., OU wy g 
4 2 
= sup $ Ail vil 
yrysl j=] 
pLlutw,..,U"Wy x 
Z 2 
2 sup X ill 
y*y=l i=] 
yLUtwy,..., U0 Wy 
p= y= * = Ve = 
` 2 
= sup D Avil = 
Lgl + pg alto + yl? ab i =k 
ylUtwy,..., UW pg 
This shows that 
xX*Ax 
sup —— 2x 
x #0 x 
XL Wye Wy ok 


for any n—k vectors wy,... But (4.2.9) shows that equality 
holds for one choice of the vectors w;, namely w;=U,—;+1, Where U= 
[u]... Un]. Therefore, 


> Wr—k+ 


Wises Wy ek 
XLWy, +0, Wy ik 


and we may replace “inf” and “sup” with “min” and “max” since the 
extremum is achieved. The argument for (4.2.13) is similar. O 


Exercise. Provide the details for a proof of (4.2.13). 


Problems 


1. Let Ae M, be a Hermitian matrix with eigenvalues à; s MS 0 S Ay 
Use (4.2.11) to show that 


4.3 Applications of the variational characterizations 181 


where, in both cases, S; denotes a subspace of dimension j and the outer 
optimization is over all subspaces of the indicated dimension. 


2. If Ae M, is Hermitian, show that the following three optimization 
problems all have the same solution: 


(a) max x*Ax 
X*x == 1 


(b) max 


1 
(c) max y 


X*Ax=1 x* 
3. If Ae M, is Hermitian and x*x =1, show that 


if at least one eigenvalue of A is positive 


A max 2 X*AX = Amin 
4. Show that the assumption that A is Hermitian is essential in (4.2.2) 
by considering A = [o i] . What is max{xTAx/x"x: 0 #xeR?}? What is 
max Ref{x*4x/x*x:0#xeC?)? 
5. Let Ae M, have eigenvalues {\,;}. Show that, even if A is not Her- 
mitian, one has the bounds 


* X*AX 


x*x 


. | X*Ax l 
min | —;— |s |\;| < max 
x#0| XX x#0 
Hint; Consider x = an eigenvector of A, and A = lo 1 | to show that 
neither bound need be sharp. 


> 1=1,2,...,n 


4.3 Some applications of the variational characterizations 


_ Among the many important applications of the Courant-Fischer theorem, 
one of the simplest is to the problem of comparing the eigenvalues of 


A+B with those of A. We denote the eigenvalues of a matrix A by \;(A). 


4.3.1 Theorem (Weyl). Let A, B e M, be Hermitian and let the eigen- 
values d;(A), \;(B), and \;(A4+8B) be arranged in increasing order 
(4.2.1). For each k =1,2,..., we have 


Ag (A) +A, (B) Sd, (A+B) Sd, (A) +A, (B) (4.3.2) 
Proof: For any nonzero x eC” we have the bound 


dy (B) = 


x*Bx 
— =, (B) 


and hence for any k = 1, 2,...,” we have 


182 Hermitian and symmetric matrices 
. x*(A+B)x 
Arl A+B) = min max r 
Wps Wyp EC” x#0 XX 
XLWwWpeoWn=k 
. x*Ax  x*Bx 
= min max ; 7 
Wise Wy EC" x#0 XX XX 
XL Wys Wyk 
. x*Ax 
= min max | ; +x) | =) (A) +A (8) 
Wys- Wy EC” x#0 XX 
xLw yeeo Wyk 


A similar argument establishes the upper bound as well. (1 


Exercise. Show that equality may be attained in the bounds given in 
(4.3.1). Hint: Let {uy, uz, ..-, Un} be an orthonormal set of eigenvectors © 


of A with Au; =),(A)u;. Consider B= au;u;* for a>0 and for a <0. 


Weyl’s theorem gives two-sided bounds for the eigenvalues of A+B. 
for any Hermitian matrices A and B. Further refinements can be obtained - 
by restricting B to have a special form - for example, positive definite, 


rank 1, rank k, or a bordering matrix. 


A matrix B €e M, such that x*Bx = 0 for all x e C” is said to be positive l 
semidefinite; an equivalent condition is that B be Hermitian and have all. 
eigenvalues nonnegative (see Chapter 7). The following result, an imme- - 
diate corollary of Weyl’s theorem known as the monotonicity theorem, 


says that all the eigenvalues of a Hermitian matrix increase if a positiv 
semidefinite matrix is added to it. 


4.3.3 


increasing order (4.2.1). Then 


\p(A) SX,(A+B) forall k=1,2,...,0 


Proof: Use the lower bound in (4.3.2) and the fact that \,(B) 20. 0 


If the matrix B is of rank 1, the bounds on the eigenvalues of A+B 
compared with those of A are in the form of an interlacing theorem: be- 
tween each successive pair of odd-numbered (or even-numbered) eigen- 


values of A+B there is at least one eigenvalue of A. 


4.3.4 


order (4.2.1), we have 


Corollary. Let A, Be M, be Hermitian. Assume that B is posi- . 
tive semidefinite and that the eigenvalues of A and A+ B are arranged in : 


Theorem. Let Ae M, be Hermitian and let ze C” be a given 
vector. If the eigenvalues of A and A+zz* are arranged in increasing 


4.3 Applications of the variational characterizations 183 


(a) Ag(A422*) Sdg41(A) SAK 4 2(A £22"), k=1,2,...,n-2 
(b) Ag(A) SAK 41 (A +22") SK 42(A), k=1,2,...,.n-2 
Proof: Let 1<k <n—2 and use (4.2.12) to write 
. x*(A+zz* 
gpa ( A tzz*) = min max Atz) 
Wree Wy fp 2 EC? x#0 x*x 
XLW, Wy ip? 
. x*(A+7zz* 
= min max ( zz )x 
Ws Wy k2 EC” x#0 x*x 
XL Wys Wy 2 
XLZ 
* 
= min max x*Ax 
Wy eres Wy—K~2eC" x #0 x*x 
Wyk] =Z KAW, Wy ks Wy — I 
* 
= min max x*Ax 
Wye Wy kde Wy ep EC" x¥#0 X*X 
KAW Wyk 2) Wy —k 
=hx41(A) 


Now let 2<k<n—1 and use (4.2.13) to write 


x*(A+zz*)x 


Ar(Atzz*)= max min 
Wises We p EC” x #0 x*x 
XL Wises Wg] 
* 477% 
< max min XNA #2" )x 
Wires Wgm JEC x#0 x*x 
XL Wyse Wey 
XLZ 
. X*AXx 
= max min 
Wry eee We JEC” x0 x*x 
Wy BZ XLW ees Wei, Wk 
< . x*Ax 
s max min yo = Aka (A) 
Wps Wg p We EC x0 
NX LWys es Weis We 


Taken together, these two families of inequalities yield the asserted 
inequalities. OO 


If Be M, is a Hermitian matrix, and if B= UAU* with U = [uw u2 --- up] 
unitary and A=diag(§,, 62,...,8,), then the rank of B is equal to the 
number of nonzero eigenvalues. If B has rank less than or equal to r, 
then we may assume that 6,,,;=--- =8,=0. If the rank is less than r, 
then some of 61, 62,..., 8, will be zero as well. The expression 


184 Hermitian and symmetric matrices 


B= 5 Biu;uř (4.3.5) 
i=] 


is another way to write B= UAU*. Conversely, any matrix of the form 
(4.3.5), with all 6; ~0 and {u;} an independent set, has rank r; if the u; 
terms are not known to be independent, then the rank of B is at most r. 
The next result, a theorem of Weyl that has its origins in the theory of 
integral equations, gives bounds for the eigenvalues of A+ B when B has 
rank r. It is an easy generalization of the rank 1 case (4.3.4). 


4.3.6 Theorem. Let A,BeM,, be Hermitian and suppose that B has 
rank at most r. Then 


(a) Ap(A+B) Sdu47(A) Ekz A+B), 
(b) Ag (A) SAn4 (AFB) Sg 2A), 


(c) If A=UAU* with U=[uj,u---u,)eM, unitary and A 
diag(\y,.-.,A,) with M <s Sh,, and if 


k=1,2,...,n—-2r 
k=1,2,...,n—-2r 


B=)p Unua tAn- Un- Un- t n. +n rei Un—r tint 
then \max (A — B) = dAy—(A). 


Proof: Let B=a,uuj +- +a, u už, where the set {u;, ..., u} CC" is 
not necessarily independent. The proofs of (a) and (b) are exactly the 
same as those of (a) and (b) in (4.3.4), except that where one imposed 
the single condition “x Lz” before, one now imposes the r conditions 
“yx 1 uy,...,u,” and completes the argument accordingly. For (c), observe 
that w;,...,u, are all eigenvectors of A—B, but (A—B)u,=0 for k 

n—-r+i,n—r+2,...,n and (A—B)uzy =u, for k=1,2,...,n—r. Since 
Anar 2 \n-r-12 + 2M, the largest eigenvalue of A —B is Xn- UO 


Exercise. Provide the details for the proofs of (a) and (b) in (4.3.6). 


The preceding result gives us enough information to derive the follow- 


ing general result of Weyl on the eigenvalues of a sum of Hermitian 


matrices. 


4.3.7 Theorem (Weyl). Let A, Be M, be Hermitian matrices, and let 
the eigenvalues of A, B and A +B be arranged in increasing order (4.2.1). : 


Then for every pair of integers j, k such that 1s j,k snand j+k2=n+ 
we have 
Np 4k—-n(AtB) Sd, (A)+k(B) 


4.3 Applications of the variational characterizations 185 


and for every pair of integers j , l 
we have p gers j, k such that 1< j,k <n and j+k<n4+1 


Nj(A)+Ag(B) Sj 44¢-1(A+B) 


Proof: Let j,k be given inte isfyi j 
, gers satisfying the first set of conditions 
A = UA(A)U* and B= VA(B)V*, where U = [u up --- u,JEM, and 
= [V v2 °°: Un] E Mn are unitary, A(A) =diag(\,(A),..., An(A)) EM 
and A(B) = diag((B), ..., \n(B))eM,. Then ° 


Aj = )y(A) ug Ut + dq —1(A) Uy Us + see + ja (A)ujp yur, | 


has rank at most n—/, By =), (B)u, up + --- +hx41(B) vy 4, 0%, has rank 
at most n — k, and A; + B, has rank at most 2n-—j—k. Then A A —-A;)= 
wa and \,(B—B,)=d,(B) by (4.3.6c), and \ (4-A; +B-B,)= 
(A+ B- (Aj + Bg)) = \n—(2n— je) (A+B) =\j+k-n(A +B) by (4.3.6b) 
(with k+r=n and r=2n—j—k). Also, o 


n(A—Aj;+B—By) <),(A—A;) +, (B—B;) 
by (4.3.2) (with k =n). Thus, we have 
hj (A) +A, (B) = Mn(A — Aj) +n (B— By) =, (A — A; + B-B,) 
=An((A+B)—(Aj+By)) = dj44—n(A+B) 
which is the first asserted inequality. The second set of inequalities follows 


directly from the first set when it is applied to ~A and —-B. 0O 


Exercise. Provide the details for the deduction of the second set of 

inequalities in (4.3.7) from the first set. Hint: Apply (4.3.7) to obtain 

an upper bound for \j+4-n(-A—B) and use the fact that \;(—A) = 
hn-i+1(A) if Ae M, is Hermitian. | 


bs As a final result of this type, we consider an interlacing theorem for 
e eigenvalues of A+B, where each of A and B is assumed to have a 


for bos form. The result, known as the interlacing eigenvalues theorem 
or bordered matrices, is similar to the case (4.3.4) in which B is assumed 


3.8 Theorem. Let Ae M, bea given Hermitian matrix, let yveC" be 
given vector, and let ae R be a given real number. Let Å e M,,.; be the 
ermitian matrix obtained by bordering A with y and a as follows: 


4.3 Applications of the variational characterizations 187 


186 Hermitian and symmetric matrices 
Let the eigenvalues of A and A be denoted by {\,} and {\;}, respectively, 


and assume that they have been arranged in increasing order \; S-+: SM 
and Asir S++: SAn S Angie Then 


be two given sequences of real numbers such that 


ApS pS ho SoS o SAn SAn Sn Sn 


Let A= diag(\y, 2, ee d,). There exists a real number a and a real vector 


a 


n . 
yeR such that {A;,A2,...,A,41} is the set of eigenvalues of the real 
symmetric matrix 


Sp Ra LAS e S Ani Sn S An S Anat (4.3.9) 


>> 


Proof: Let an integer k be given with 1 < k <n. We shall prove that Wares 
he Seat. Lett =[x7 EEC", EC", EEC, and w= [w w] e C, 


jel Ai lem 
w;e C”, wEC. Use (4.2.12) of the Courant-Fischer theorem to write | yt ! a | €Mn+1(R) 


Nee = _ min val max val Py Proof: Obviously f^i, A2,---5 An} is the set of eigenvalues of A, and since 
Wy Pnp- EE eum ae aan trA=trA+a, we must have a=tr A—tr A= 577] A- Dla dj. The char- 
” yi acteristic polynomial p(t) of A is easily computed as 
, x a 
> min max — det(tl—A 
Wyss Wyp ECT] 40 x*xX ( ) 
LLWy ey Wy ok ti-Ai— 
thea = det | -irp 
“A —y t—a 
= min max xo =z I 10 T—A i -i 
Wps Wy EC" x#0,xeC" x*x = der - ----=-=r -+ H- i” an (=A) ye 
X LW ees Wyk [GI —A) y] n 1 — yT t~a 0 1 
1 


For the lower bound on àx we use (4.2.13). 


= det] -13 |e 0 _ | 


a . K*AX 0 TU nD- yQ A 
Ak = max min ee, i (t=a)=y (T-A) y 
wp. Dg ECHL £40, teC+l =[{(t—a)—vi(ti-A)7! 
peo Pk] sean a [((t—a)—y"(tl—A)~'y] det(t7—A) 
x* A <. _ t n 2 1 n 
< max min =< = |( 7a)= È yi 7d II (—=X;)=pá(t) (4.3.11) 
wpn Wg ECH! 20 XX = i ji=l 
tiep et We have already determined the necessary value of a, so it remains to be 
Ax shown that 7 real numbers y; can be found for (4.3.11) so that p4(A,) =0 
= max min ar =z fork =1,2,...,n+1. 
Wires wg EC” x#0,xeC" XX Define the polynomials 
KA Wyses Wey n+] 
We have seen two examples of interlacing theorems for eigenvalues: ifa f= TT @-X)), degree of f=n+l (4.3.12) 
given Hermitian matrix is modified by adding a rank one Hermitian i=l 
matrix or by bordering, then the new and old eigenvalues must interlace. n 
a()= TT (t—\;), degree of g=n (4.3.13) 
i= 


What about the converse? If two interlacing sets of real numbers are 
given, can they be realized by a Hermitian matrix and a suitable modifi- 
cation? The answer is in the affirmative, and we give as an example a 


converse to Theorem (4.3.8). 


By the Euclidean algorithm we must have 


f(t) = a(t) (t-—c)+r(t) 
where c is a real number and r(t) is a polynomial of degree at most 
n—1. By explicit computation we find that c= D?*]};—- Df.) \; =a. 


4.3.10 Theorem. Let n be a given positive integer, and let 
Furthermore, f(A.) = 8A) Ara) try) = r(x) for k =1,2,...,7 


(\:i=1,2,...,.2) and f\j:i=1,2,...004+0 


188 Hermitian and symmetric matrices 


because g(\;,) =0. The polynomial r(t) is therefore known at n points . 


and can be written explicitly in terms of Lagrange interpolating poly 
nomials if the points of interpolation \,;,...,, are distinct. Under thi 
assumption, g(t) has only simple roots, and the Lagrange interpolation 
formula for r(t) is 


M g(t) 
Mi) = > JA) EAD — A) 
Thus, 
JO aop gua a TIO)! 
a(t EOF ey TOT a) TON, 
Because f(A) =0 for all k=1,2,...,n+1 we must have 
(A, -a)— 5 A) l k=1,2,...,n+1 (4.3.14) 


i) AN A 


Notice that if \,=; for i=k—1 or k, then the corresponding term 
1/(t —\;) has a zero coefficient and there is no singularity at £ = d,. If we © 
can set y? = —f(;)/2’(\;) for i =1,2, ..., n, then (4.3.11) guarantees that | 
pa) = 0 and we are done. We must show, therefore, that f(\;)/g’(\;) < 
0 for i=1,2,...,, and it is now that the interlacing assumption must be | 


used. Using the definitions of f(t) and g(t) and the interlacing assump 
tion, we find that 


; n+l n 
fd) =(-1)""! IT [N-A 
j= 


g'(d;) =(-1)""' II A= djl 
igi 
and hence f(\;) and g’(\;) always have opposite signs. 

It is easy to modify the argument to cover the case in which some o 
the \; terms coincide. If, for example, \; = Mz = ++: = Ak < Aga S++ fo 
some k = 2, then Åv = --- = Ñg = \j. The polynomial f(t) in (4.3.12) has 
factor (t—h,)(t—d)*~!; the polynomial g(t) in (4.3.13) has a facto 
(t—),)* and k is the exact multiplicity of \, as a zero of g(t). We may 
therefore modify f(t), g(t), and r(t) by dividing each by (t—,)*~!. Th 
modified polynomial g(t) will have \, as a simple zero. If we proceed in 
this way to remove all the multiple roots of g(t), the argument can pro 
ceed as before, and the conclusion is the same. O 


The preceding results treat the situation in which a Hermitian matrix i 
“bordered” by adding a new last row and column, but they could also be 
thought of as giving information about the behavior of the eigenvalue 


4.3 Applications of the variational characterizations 189 


of a Hermitian matrix when its last row and column are deleted. There is, 
of course, nothing special about the /ast row and column. If the ith row 
and column of the matrix A in (4.3.8) are deleted instead of the (n+1)st, 
one merely changes e,,,, to e; in the proof and obtains the same inter- 


— lacing inequalities (4.3.9). 


Theorems (4.3.8) and (4.3.10) together say that the interlacing in- 
equalities (4.3.9) are a complete description of the relationship between 
the eigenvalues of a Hermitian matrix and the eigenvalues of any one of 
its principal submatrices of order n—1. If one considers simultaneously 
all n of the (n—1)-by-(n—1) principal submatrices of A, more can be 
said. Let A; denote the principal submatrix obtained by deleting the jth 
row and column of A, j=1,2,...,”, and let the eigenvalues of A and A; 
be arranged in increasing order. For each i =1, 2, ..., 4-1 we have 


max (Aj) = ENA) +E 41 (A) 


ls/j/<sn 

. n—i i 
min \j(A;)<———d,(A) +d, (A) 
lsjsn n n 

and 
, n—2 1/2 

max An—1(Aj) — min naa (5) [An ( 4) (A)] 
isj<n l<jsn n 


If all the eigenvalues of A are nonnegative, that is, if A is positive semi- 
definite, the first of these three inequalities implies that there is at least 
one principal submatrix A; for which 


— 1 
hn (Aj) = Dy (A) 


Thus, it is not possible for the spectral radius of every principal sub- 
matrix of a positive semidefinite Hermitian matrix to be “small.” 

One may wish to delete several rows and the corresponding columns 
from a Hermitian matrix. The remaining matrix is a principal submatrix 
of the original matrix. The following result can be obtained by repeated 
application of the interlacing inequalities (4.3.9), but it is just as easy to 
prove the assertions directly from the Courant-Fischer theorem. This 
result is sometimes called the inclusion principle. 


4.3.15 Theorem. Let Ae M, be a Hermitian matrix, let r be an integer 
with 1=rsn, and let A, denote any r-by-r principal submatrix of A 
(obtained by deleting n —r rows and the corresponding columns from A). 
For each integer k such that 1 s k <r we have 


hx (A) = dx (A,) = Ae+n—r(A) 


190 Hermitian and symmetric matrices 4.3 Applications of the variational characterizations 19] 
Proof: Suppose A, € M, is formed by deleting rows ii, .--, Ín- and the deleting the last n—r rows and columns. The assertion now follows from 
corresponding columns from A, and let 1<k <r. Use (4.2.12) to write (4.3.15). O 
, x*Ax The matrix B,eM, in the preceding result can be written as B,= 
Agan—-r(A) = min max ; U*AU, where UeM, , i a . 
waa w pec” x0 xech XX , where Ue M,,, is a matrix with r orthonormal columns. Since 
XLWp- Wrk tr B, =) (B,)+ ++ +) (B,), the following extremal characterization fol- 
l x*AX lows from summing the inequalities (4.3.17). 
= min max 7 
Wy Wp pec” x#¥0,xEC" XxX 
xa Wies Wr=k 4.3.18 Corollary. Let A € M„ be Hermitian and let r be a given integer 
Yeison finor with 1<r<n. Then 
, y*A,;y , 
= min o max o —— = hx (A,) M(A)+ -+A (4)= min trU*AU (4.3.19) 
Vises Up Ke y#O, ye U*UaleM 
PL Up k 7 UeM, 
=E Anpi (A) ++ +h_(A)= max trU*AU PrO (4.3.20) 
Again, assuming 1 <k <r, use (4.2.13) to write U'U=IeM, 
(A) = max min x*Ax Equality holds in (4.3.19) if the columns of U are chosen to be ortho- 
wpe wg eC" x#0,xEC" X *X normal eigenvectors corresponding to the r smallest eigenvalues of A. A 
XLWyy eee Wg] similar choice yields equality in (4.3.20). These two inequalities may be 
l x*Ax thought of as generalizations of the Rayleigh-Ritz theorem (4.2.2). They 
s max on mn cn Xx can be used to prove many other interesting inequalities. 
penny WE x#0,x : : : 
wi k=1€ xLw. WeI Sometimes one has bounds on the behavior of the quadratic form 
X LCs tiny x*Ax on a subspace. The Courant-Fischer theorem can be employed in 
*A this case to give bounds on the eigenvalues of A. 
. Y Ary - 
= max min ——~ =hx(A;) E 
vpo tg- 1E yxO,vyecr Y 
YLVp en Uk] 4.3.21 Theorem. Let A e M, be Hermitian, let k be a given integer with 


1<k<n, let the eigenvalues of A be arranged in increasing order (4.2.1), 
and let S; be a given k-dimensional subspace of C”. If there exists a con- 
stant c> such that x*Ax2=c2x*x for all xe Są, then \,2A,-12°°°2 
hn-k+12 C2. If there exists a constant c; such that x*Ax <c,x*x for all 
xeS,, then cy =A, 2 °°: 2M. 


The following easy consequence of Theorem (4.3.15) is known as the: 
Poincaré separation theorem. It can be employed in situations (such as 
in quantum mechanics) in which one has information about the inner 


products uj Au;. 


4.3.16 Corollary. Let A € M, be Hermitian, let r bea given integer with 
1<r<n, and let m,...,u,¢C" be r given orthonormal vectors. Let B,= 
[uj'Au;|eM,. If the eigenvalues of A and B, are arranged in increasing 


Proof: Let u,,...,U,—% be n—k orthonormal vectors that span S}. Use 
(4.2.13) to write 


| X*Ax x*Ax 
order (4.2.1), we have c< min —;— = min > 
0 XxX x#0 x"x 
Akl A) SA,(B,) S Ak+n-r(4), k=1,2,...,7 (4.3.17) x# 
xe Sk KLUM, ey Uy ek (4.3.22) 
ous * 
Proof: If r<n, choose n—r additional vectors M,+1,---, Mn SO that < max min x*Ax Ny ka 
* 7 ~ n= 
{tli cep Upp Urti -+s Un} IS an orthonormal set, and let U = [u ... Un] E€ Why eres Wa pE C” x40 x* 
NAW, Wy ig 


M,. The matrix U is unitary, U*AU has the same eigenvalues as A, 


and the given matrix B, is a principal submatrix of U*AU obtained by Similarly, we can use (4.2.12) to write 


192 Hermitian and symmetric matrices 


X*Ax x*Ax 

cy = max A = max 
x#0 XX 

xe Sy; X Luj, Uy ig 


4.3.23 Corollary. If A e M, is Hermitian and if x*Ax = 0 for all vectors 
x in a k-dimensional subspace, then A has at least k nonnegative eigen- 
values. If x*Ax>0 for all nonzero vectors x in a k-dimensional sub- 
space, then A has at least k positive eigenvalues. 


Proof: The first assertion follows from the preceding theorem with 
c2=0. If An-g+1=0, then the inequalities (4.3.22) show that 


. X*Ax . 
0 = min = min x*Ax 
x0 XX xx=] 
xe Sp xe S, 


But $; is finite-dimensional, so the set D = {x e Sy: x*x=1} isa compact 


set [see (5.5.6)] and the continuous function x*Ax achieves its minimum . 
on D for some xp Sp such that x*x =1; in particular, x) #0. But then » 
x9 AXo = 0 contradicts the assumption that x*Ax > 0 whenever xe S, and ` 


x#0. 0O 


Both the eigenvalues and the main diagonal elements of a Hermitian — 
matrix are real numbers, and the sum of the eigenvalues is the same as 


the sum of the main diagonal elements (the trace). The precise relation 


ship between the main diagonal elements and the eigenvalues is given by 


the notion of majorization. 


4.3.24 Definition. Let œ =[a;]E R” and 8=[8;]e R" be given. The 
vector 2 is said to majorize the vector a if 


k k 
min f 5 Bi lsi < e <isn}=minf 5 a; ilsi <insn| 
=] j=l 
for all k=1,2,...,n with equality for k =n. If we arrange the entries of 
a and @ in increasing order aj Saj s- S Qjp Bm S Bm S ++ S Bm, 
the defining inequalities can be restated in the equivalent form 


k k 
È bm> Do, foral k=1,2,...,n (4.3.25) 
i=l i=l 


with equality for k=n. 


4.3 Applications of the variational characterizations 193 


Thus, the real vector 8 majorizes the real vector a if the sum of the k 
smallest entries of 8 is greater than or equal to the sum of the k smallest 
entries of a for k=1,2,...,n—1 and the sums of the entries of Band a 
are equal. Notice that the entries of 8 and a may be permuted arbitrarily 
without affecting whether 8 majorizes a. 

The notion of majorization is an important one that arises in many 
places in matrix theory as the precise relationship between two sets of 
real numbers. One example of this phenomenon is the following theorem 
of Schur (1923). 


4.3.26 Theorem. Let Ae M, be Hermitian. The vector of diagonal 
entries of A majorizes the vector of eigenvalues of A. 


Proof: The proof is by induction on the dimension. For n= 1, there is 
nothing to show, so we suppose that the result is valid for Hermitian 
matrices of dimension k for all k<n—1. Let A=[a; jleM,, be a given 
Hermitian matrix and let A;eM,,_, be the principal submatrix of A 
obtained by deleting the row and column corresponding to the /argest 
diagonal entry of A. Let \; < -+ = à, be the ordered eigenvalues of A, let 
MS +++ S Aa- be the ordered eigenvalues of A, and let j,i, Sahh S 
“t Sa;,;, be a rearrangement of the diagonal entries into increasing 
order. By the induction hypothesis, we have 


k k 
5 Qi, = 5 dj for all k=1,,...,n-1 
j=l j=l 


Because of interlacing [Theorem (4.3.8)], we also have 


ApS S)d2S\S-+° SNS), 


and hence 
k k 
~N2 DA; forall k=1,...,n-1 
j=l j=l 


k k 
Ya i,2 DA, k=1,...,n—1 
j=l 


j=l 


and equality holds for k=n because the trace is the sum of the eigen- 


values. O 


Majorization is also useful in expressing the relationship between the 


eigenvalues of a sum and those of its summands. 


Patana AISAS ee 


194 Hermitian and symmetric matrices 


4.3.27 Theorem. Let A, Be M, be Hermitian matrices, and let \(A) = 
[\;(A)], \(B) = [\;(B)], and (A + B)=[);(A+B)] denote the column 
vectors in R” whose components are the eigenvalues of A, B, and A+B 
arranged in increasing order (4.2.1). The vector (A+B) majorizes the 
vector \(A)+A(B). 


Proof: For any k =1,2,...,, use (4.3.18) to write 


k 
SA(A+B)= min trU*(A+B)U 
i=] 


= U*U=16 Mg 


= min (trU*AU+trU*BU) 


U*U=1e Mx 
trU*AU+ min trU*BU 
U*U=l1eMx 


> min 
UtU=IEM, 


k k k 
= X (A)+ D N(B) = X (A) +B) 


i=l 
Since tr(4+B)=tr A+tr B, we have equality fork=n. O 


We have alluded to the fact that majorization is the precise relation- 
ship between the main diagonal elements of a Hermitian matrix and its 
eigenvalues, but we have established only half of this relationship in 
(4.3.26). To show the other half, we need the following technical lemma. 


4.3.28 Lemma. Let nz2 and let q; S&S t Say and 8s bse S 
B, be given real numbers. If the vector 6 = [8;] majorizes the vector a= 
[a;], then there are n—1 real numbers Yi, «++, Yn-1 such that 


QQ) Sy) S028 y25935°°° S Qg- S Yn-1 = On 


and such that 6’=[6),.--,8n-1]’€R” | majorizes y= [ise Yn-1] € 
R”! 


Proof: If n=2, we have a; £ fı and a; + @z = Bı +82, Or 

oy = (8; — 0) + Bz = B2 = Bi 
Thus, a; <8; < œz, so we can choose yı =; and satisfy the stated con- 
ditions. Now assume that n> 2 and let A={[61,.--,0,-1]/} CR” | denote 
the set of points defined by the inequalities 
(4.3.29a) 


ajs sa sisas <n) Sby-1 SO 


k k 
Zas E Bi, k=1,2,...,n-2 (4.3.29b) 
i=l 


i=l 


4.3 Applications of the variational characterizations 195 


Because 8 majorizes a, the point 6=@=[aj,...,a,-1] is always in 
A, so the set A is always nonempty. The set A is evidently bounded 
and closed, hence compact, and it is easily seen to be convex. If 6= 
li s On-1) EA, define J(8) =ô; +ô, + --» +6,—1. Notice that f(a) = 
atte +an-1 Sbit +Bn-1. If we can show that there is some 
ĝe A with f(b) = Bi+ .-»+6,-1, then by convexity of A we will have 
ta+(1—1)5eA for all te [0,1] and g(t) = f(tat+(1—1)4) will be a con- 
tinuous function with 


g(0)2= B+ ++ +Bn-128C) 
From this we can conclude that for some źọe [0,1], we have g(fo)= 
Byt+-+++By—1. The point y =[yi] =foa+( —fo)6 will satisfy the stated 
conditions. 
Since f(+) is a continuous function on the compact set A, there is a 
point ĉe A such that 


max f(5) = (ê) (4.3.30) 


eå 
We shall show that f(Ê)=61+-+6n-ı; A maximizing point ôe A 
satisfies the inequalities (4.3.29a and b), and hence 


k=1,2,...,n—1 (4.3.31a) 


(4.3.31b) 


If all the inequalities (4.3.31b) are strict, and if at least one of the 
inequalities (4.3.31a) is not an equality, then at least one component of ô 
can be increased with a consequent increase in the value of f(5). Since 
this contradicts the extremal property (4.3.30), we conclude that all 
the inequalities (4.3.31a) must be equalities, §=[a, a3,...,a,]/, and 
fÊ) = 00g #8 $y = (atat e $y) a = (Bi + BQH + + Bn) O81 = 
(Bit e +Bn—1) + Bn— a1 = (Bit oe + Bn-1) + Bie = Bit ++ + Brea, 
which is what we wish to show. 

If not all the inequalities (4.3.31b) are strict, then equality holds for at 
least one value of k. Let r denote the largest such value of k. Then 


k k 
2 ô< 5 Bi for k=rt+l,...,n-2 


By the same argument as in the preceding paragraph, we must have 
Ôk = Oka fork=r+l,...,n—1. Thus, 


196 Hermitian and symmetric matrices 
FÔ) = (5) + + HÂ) + (Brat oe Hn) 

= (8) +--+ +B) + lart +n) 

= (By + +++ +By-1)+ (ait +--+ +a) 
—(ayt res Harg) (Breit e HBa) 

= (By+ +++ +8Bn-1)+ (Bit =: +Bn) 
—(ayt e+ tore) (Brite Heni) 

= (By +--+ Bn) Hit + +B) (arts ara) 
+ (Br4a4 ++ Bn) (Brai t +++ Hena) 

= (By + +++ +Bn-1) + (Bp 42-8, 41) 
+ (Br +3—-Br42) ++ + (Ba Bn-1) 


= Bytes +Bn-1 O 


We can now prove a converse to (4.3.26). 


4.3.32 Theorem. Let nz land let q Sa. <--- Sa, and );S)\28°"°: 
An be given real numbers. If the vector a = [a;] majorizes the vector A 


[\;], then there exists a real symmetric matrix A = [a;;] € M, (R) such 
that a; =a; for i=1,2,...,n and such that {\;} is the set of eigenvalues 


of A. 


Proof: The assertion is trivial for n=1. Suppose it has been proved for 
all such vectors a and \ with at most n—1 elements. By the lemma there: 


exist real numbers y; S y2 £ + SYn—1 Such that 


MiSs 1S 2S SAn S Yn-1 Sn 


and such that a’=[q,...,@—1]’ majorizes y=[y;]/¢R"'. By the in- 


duction hypothesis there is a real symmetric matrix B = [b;;] € Ma —ı with 


b; =a; for i=1,2,...,n—1 such that {y;} is the set of eigenvalues of B. If 


T =diag(y1, Y2, ---, Yn—1) € Mn_1(R), there is a real orthogonal matrix 


QeM,,_;(R) such that B =QT Q". By Theorem (4.3.10) there is a real 


symmetric matrix 
a T _ 
Â= | y 7 [eM R), yeR”™'!, «eR 
y a 
that has eigenvalues {\,}. If we set 


azl@ dale’ Te org’ °”)=| B e] 
=|0 | 9 i=l on a (Qy) a 


4.3 Applications of the variational characterizations 197 


then A has eigenvalues {\;} and has main diagonal entries a, @2,..., 
a-p &. But tr A=4q + e +a, ata =M e A= e +a, by ma- 
jorization, so w=a, and A has the correct diagonal entries. () 


The preceding result not only completes the circle of implications 
dealing with the relationships between the main diagonal entries and 
eigenvalues of a Hermitian matrix, but also permits us to clarify the geo- 
metric meaning of the majorization relation itself. A doubly stochastic 
matrix A € M, has n? nonnegative entries such that the sum of the entries 
in every row and every column is +1. Birkhoff’s theorem (8.7.1) guar- 
antees that every doubly stochastic matrix is a convex combination of 
finitely many permutation matrices and conversely. 


4.3.33 Theorem. Let a =[&;]e R” and 8 =[6;] e R” be two given real 
vectors. The following are equivalent: 


(a) 8 majorizes a; 
(b) There is a doubly stochastic matrix S e M, such that 6 = Sa; and 


N N 
(c) sef E pian), where Ils N<œ,  pz0, » P=! 
i=] i=] 


and a,;,€ R” is a vector whose components are some permutation 
of the components of the given vector a. 


Proof: If we assume (a), then by (4.3.32) there is a real symmetric matrix 
B=[b;;] c€ M, with main diagonal b; = 6; and eigenvalues \;(B) = a;. By 
the spectral theorem there is a unitary (even real orthogonal) matrix U= 
[u;] E€ Mp such that B= UAU*, where A=diag(ay,...,a@,), and consid- 


eration of the main diagonal entries of B shows that 6 = Sa, where S = 
o [sy]€ Mn is given by s;; = |u; jl. Such a matrix S has every row and col- 
" umn sum equal to 1 since every row and column of U is a unit vector, so 
_ Sis a doubly stochastic matrix (of a special type known as an ortho- 
stochastic matrix). This shows that (a) implies (b). 


A proof that (b) implies (a) is outlined in Problem 9 at the end of this 


- section. 


If we assume (b), then Birkhoff’s theorem (8.7.1) shows that 
N N 
S=} piP, where pp=0, Èp=l 
i=l i=l 
and each P; is a permutation matrix. Thus, 


N N 
B=Sa= X piPia= X piar, where P,a = as, 
i=] 


f=] 


This identity also establishes the reverse implication. O 


198 Hermitian and symmetric matrices 4.3 Applications of the variational characterizations 199 


Thus, the collection of all vectors 6 =[6;, -..,8,]’ that majorize a given 5. Deduce Theorem (4.3.15) by applying Theorem (4.3.8) n—r times. 
vector a = [a], ---, &n]” may be obtained by computing the n! vectors (not 
all distinct if not all the a; terms are distinct) obtained by permuting the 
n components of the vector a and then forming the convex hull of these 
vectors. 

Remark: While there is universal agreement that the basic notion of 
majorization is important, there is not universal agreement on notation. 
Some authors define majorization with the inequality reversed in (4.3.25), 
and some define it with respect to decreasing arrangements of the two 
sets. For this reason, one should use caution when using or quoting re- 
sults about majorization from different sources. See Problem 11 for a jus- 
tification of our choice of the definition of majorization. 


6. Show that Weyl’s simple inequalities (4.3.2) need not hold if A and B 
are not Hermitian. Hint: Consider A= lo | and B= [9 o]: 


7. If A,BeM, are Hermitian matrices with eigenvalues arranged in 
increasing order, and if 1 sk <n, show that 


\(A+B) <min{h;(A)+d,(B) i+ f=ktn} 


8. Provide the details for the case of coincident \; terms in the proof of 
Theorem (4.3.10). Hint: If \; is a k-fold root of g(¢) =0 with k = 2, show 
that one obtains (4.3.14) in which the factor (¢—d,)‘~' in g’(t) in the 
denominator of (4.3.14) is canceled into a factor (f— Ap‘ Hin S(t) in the 


Problems numerator of (4.3.14). 


9, Let S=[s;;]eM, be a doubly stochastic matrix (8.7), and let xe R” 
be a real vector. Show that Sx majorizes x. Hint: Let y = Sx. It suffices to 
assume that y < + <y, and x; + S Xp, for if not, one could consider 
Py and Qx for suitable permutation matrices P and Q; PSQ” is still 
doubly stochastic. Let w® = Df, s;;, so 0s w <Land Dj- w =k. 


1. Recall that the spectral radius of a matrix A €e M, is the quantity 
p(A) =max{|d;(A)|} 
Let A, Be M,, be Hermitian. Use Weyl’s theorem (4.3.1) to show that 


M(B) Sd, (A+B) —)dg (A) Sy (B) Show that 
k n k n 
and hence that § Qi-x)= 5 wx — £ vitx (K- 5 wi") 
[\e(A +B) —e(A)| = p(B) = irl i jl 
k 


for all k=1,2,..., 7. This is a simple example of a perturbation theorem 


n 
for the eigenvalues of a Hermitian matrix [cf. (6.3)]. j=l j=ktl 


i= 


2. Show how to use only the first set of inequalities derived in the proof and that all the terms in the latter sums are nonnegative. 


of Theorem (4.3.4) to obtain all the inequalities asserted in the theorem. 
Hint: A=(A+22*)¥ Zz". 


3. Provide the details to show that (4.3.6b) is equivalent to (4.3.6a). 


10. Give another proof of (4.3.26) using the following ideas: If A= 
[a;;]€M,, is Hermitian, then A= UAU* with U = [u;;] € M, unitary and 
A=diag(\,...,,) a real diagonal matrix. Let @=[ay1, @22,.--5 Qn)" 
be the vector consisting of the main diagonal entries of A and let x= 
[n Azs -3 An]. Show that a= Px, where P =[p;;]=[|u;;|7]. Show that 
P is doubly stochastic and use Problem 9. 


4. The only case of Theorem (4.3.6) used in the proof of Weyl’s theorem 
(4.3.7) is \, (A+B) =,—,(A), where B has rank at most r. Show that 
this case can be proved without using the Courant-Fischer theorem 
by providing the details for the following argument. Suppose B 
Biyiıyř+ -+6 yry; and let A=UAU* with U=[u, ++; un] unitary. 
Then there exist r+1 scalars &n-r, On-r+1s- Œn Such that the vector 
X=QAp- rUn- + +O_Uy Satisfies x Ly; for all i=1,2,...,7 and x*x 
lan-r| + +++ + lo,|? =1. Then 


11. Let x=[x, wees Xp] and y= [y S Pn] be two given nonnegative 
real vectors and suppose that y majorizes x. Show that yi +t Yn È= X1 Xy 
Hint: Use (4.3.32) to construct a real symmetric matrix A = [a;;]€ 
M,(R) with main diagonal entries apj =y; and eigenvalues \;(A)= 
x;. Then Hadamard’s inequality (7.8.1) says that ay, ++ ann Z det A = 
Mte An. Remark: It is this result that motivates our choice of the defini- 
tion (4.3.24) of majorization. If one chooses the opposite direction for 


ti 
\n(4—-B) >= x*(A-B)x= ail Ni (A) = Xn—r(A) , ) 01 
n D laf A n=r the inequality in (4.3.24), then the consequence is that the direction of 


ij=n-r 


4.4 Complex symmetric matrices 201 


18. Let A e M, be Hermitian and let a; = det A({1, 2,...,7}), i =1, 2, 
„n. If all a; #0, show that the number of negative eigenvalues of A 
is equal to the number of sign changes in the sequence +1, a), d2, +., ap- 
In particular, if all these principal minors are positive, A has no nega- 
tive eigenvalues at all. What happens if some a;=0? Hint: Use inter- 
lacing. 


19. Let A = [a;] Ee M, be normal. Show that if A has a “small” column 
or row, then it must also have a “small” eigenvalue. More precisely, let the 
squares of the absolute values of the eigenvalues of A (\d,|?: i=1,...,n} 
be placed in nondecreasirig order and denote the resulting ordered values 
by up sue svs. < v2. Let the square roots of the sums of squares of 
the absolute values of the entries in the rows {(D%—; |ajx|?)!7: i=1,..., 0] 
(or the columns) be placed in nondecreasing order and denote the result- 


ing ordered values by R <= Ris --- <R,,. Show that 


200 Hermitian and symmetric matrices 


the inequality in Problem 11 is reversed; that is, if y “majorizes” x in this 
sense, then the product of the y; is less than the product of the x;. We 
prefer a definition in which the inequality for the products goes in the 
same direction as the majorization. | 


12. Let Ae M, be a Hermitian matrix with positive eigenvalues 0 <),< 
2S +--+ SX, and let 1 <r sn bea given integer. Use Problem 11 to prove 
that 


hy Ag+ Ap = min (ur Au) (uzAu) «+ (uf Au,) 


where the minimum is over all sets of orthonormal vectors {u, u,...,u,}C - 
C”. Explain why this result may be thought of as a multiplicative ana- - 
log of (4.3.19), as a generalization of Hadamard’s inequality (7.8.1), . 
and as a chain of inequalities that connects the Rayleigh-Ritz theorem - 
(4.2.2) to Hadamard’s inequality. Hint: The case r=n is (7.8.1). What ` 
is the case r= 1? If 2<r<n, use (4.3.19) to show that the vector 
[uš Au, uyAUp, ..., uf Au,]* majorizes the vector [j4,..., uy)", where p; 
à; for i=1,2,...,r—land 


fy = (Up Au + e +u Au) (yt +s +A, 1) +P Au, = uxAu, 


k k 
Zos DR? for k=1,...,7 
i=l i=l 
with a similar upper bound involving the column sums of squares. Hint: 
The quantities v7? are the eigenvalues of the Hermitian matrix AA*. What 
are the main diagonal entries of AA*? Use majorization and Theorem 
(4.3.26). Consider A*A for the column sum inequalities. 


Now use Problem 11 to show that \,---\,_,;u7Au, < IT}, uj'Au;. 


13. Let A=[a;;]—¢M, be a Hermitian matrix with nonnegative eigen- 
values 0 =; £ M2 --- SX, Show for each r = 1, 2, ..., n that the product - 
\;++-A, is less than or equal to the product of the r smallest main diag 
onal entries of A. 


Further Readings. For more information about majorization, see [MOI]. 
For a discussion of the general interlacing inequalities among the eigen- 
values of principal submatrices [mentioned following Theorem (4.3.10)] 
ee C. R. Johnson and H. A. Robinson, “Eigenvalue Inequalities for 
Principal Submatrices,” Lin. Alg. Appl. 37 (1981), 11-22. The argument 
- outlined in Problem 4 is the original proof from H. Weyl, “Das asymp- 
- totische Verteilungsgesetz der Eigenwerte linearer partieller Differential- 
: gleichungen (mit einer Anwendung auf die Theorie der Hohlraum- 
strahlung,” Math. Annalen 71 (1912), 441ff.; see the proof of the lemma 
on pp. 444-445. Weyl states and proves his result in terms of integral 
equations, but the translation to linear algebraic form is immediate. 


14. If A,BeM, are Hermitian and A—B has only nonnegative eigen 
values, show that \;(A) = ,;(B) for all /=1,2,...,7. 


15. Use (4.3.18) to prove (4.3.26). Hint: By a permutation, arrang 
A=[a;;] so that a); Sa) S +t Sann. Then take U=[e e -:- e-]EM,, 
and show that \;(A)+ ---+,(A) Str U*AU=ay, +--+ +4,,. 


16. Suppose A e M, is Hermitian. Let \; < --- <),, be the eigenvalues o 
A and A; S+ S A; n1 be the eigenvalues of the (n—1)-by-(m—1) prin 
cipal submatrix A({7}’). Show that 

MSA SMS NaS Sin-1 Sdn 4.4 Complex symmetric matrices 
These interlacing inequalities are often attributed to Cauchy. Show also 
that these inequalities imply the inequalities in (4.3.15). Hint: Use (4.3.8 
or (4.3.15). 


17. If A= [a;;] e€ M, is Hermitian, and if a; =à, for some i, show tha 
dik = Ay; =0 for all k=1,2,...,n, k#i, and similarly if a; =), Hint: 
Show this by explicit calculation for n =2 and apply interlacing. 


A matrix A € M,, is symmetric if A= A". In many instances, symmetric 
matrices under study also have only real entries, so they are real Her- 
mitian matrices and all the results discussed so far in this chapter apply to 
them. 

There are some circumstances in which one confronts complex sym- 
metric matrices, however. One example is in the study of regular analytic 


202 Hermitian and symmetric matrices 4.4 Complex symmetric matrices 203 


mappings of the unit disc in the complex plane. If f(z) is a regular 
analytic function on the unit disc, and if f(z) is normalized so that 
f(0)=0 and f’(0) =1, then f(z) is one-to-one (sometimes called uni- 
valent or schlicht) if and only if 


_ matrices that includes the symmetric matrices can always be factored as 
A=UAU’, where U is unitary and A is upper triangular. If an upper 
triangular matrix is symmetric, it must be diagonal. 


S xy log —— > 
X; X; 108 o = 
pa 1—2;Z; 


44.3 Theorem. Let AcM, be given. There exists a unitary Ve M, 
and an upper triangular Ae M, such that A = UAU Tif and only if all the 
- eigenvalues of AA are real and nonnegative. Under this condition, all the 
_ main diagonal entries of A may be chosen to be nonnegative. 


n 
5 xix; log 
i,j=l 


| ZiZj era | 
SEDDS) Zi 


(4.4.1) : 


for all choices of points z),...,2,€€ with |z;|<1, all choices of points 
Xp.. X EC, and alln=1,2,.... If Z;=Zj, the difference quotient on the 
right-hand side is to be interpreted as f '(zi). These formidable inequal- 
ities, known as the Grunsky inequalities, have the very simple algebraic . 
form 


- Proof: If A=UAUT, then AA = UAU'UAU* = UAAU* since U is uni- 
: tary and U T= Ū*., The main diagonal entries of the upper triangular 
- matrix AA are nonnegative real numbers whenever A is upper triangular, 
and AA is unitarily similar to AA, so the necessity of the condition fol- 
` lows from the fact that the eigenvalues of an upper triangular matrix are 
_ exactly its main diagonal entries. 

For the converse, assume that AA has only nonnegative eigenvalues, 
- and let x be an eigenvector of AA; that is, AAx =x with X20 and 
_ x#0. There are two possibilities: 


x*Ax = |x" Bx| (4.4.2) 
where x =[x;] e€ C”, A=[aij]eMn, B= [bij] E€ Mn, 


7 p a e 
aj =lo8 Fe” b= los A Zi—Zj 


Notice that A is Hermitian and B is complex symmetric. 

Another example in which complex symmetric matrices arise naturally 
is in the general area of moment problems. Let {ao, 41, 42, ...} be a given 
sequence of complex numbers, let n=1 be a given positive integer, and 
define Az, = [a;;] = [a;+;] E Mon. Notice that A>, is a complex symmetric 
matrix of a form known as a Hankel matrix. We consider the complex 
quadratic form x" A),x for xe C?” and ask whether there is some fixed 
constant c>0 such that 


(a) AX and x are dependent; or 
(b) AX and x are independent. 


~ In the former case (a) (which always happens when \ is a simple eigen- 
- value of AA), there is some pe C such that AX = px. But then AAx = 
Apx = BAX = jax = |p|?x =x, so |a|7 =). In the latter case (b) (which 
could happen if \ is a multiple eigenvalue of AA), the vector y= AX + px 
is nonzero for all » e C, and we choose p to be any complex number such 
that |a|? =pa=d. Then Ay = A(Ax+ px) = AAx + FAX = x + MAX = 
jx + RAX = (AX + px) = py. In either case (a) or (b), we have shown 
there is some nonzero vector ve C” and some ae C with |a|? =) such 
that Ad = av. Since this identity is unchanged if v is multiplied by a posi- 
tive scalar, we may also assume that v is a unit vector. Also, for any 0 € R 
we have e" Ab = A (ev) =e av = (e 7a) (ev), and ev is a unit 
vector if v is. Since we can choose 6 so that e~7/8a = 0, we conclude that 
if Ae Mp, and if \ is a nonnegative eigenvalue of AA, then there exists a 
unit vector v such that AŬ = øv, and o = +vVd=0. 

Now extend this vector v to an orthonormal basis {v, v2, ..., Un} of C”, 
and let V, be the unitary matrix that has these vectors as columns. The 
first column of the matrix VAV, has entries v7 Av = øvřv = oô; because 
of orthonormality and the relation Ad = ov. Thus, all but the first of the 
entries in the first column of V,/AV, must be zero (the first entry might 
also be zero). If we write this matrix in partitioned form as 


|xTAzn x| £ cx*x forall xec?” forall n=1,2,... 


According to a theorem of Nehari, this condition is satisfied if and 
only if there is a Lebesgue measurable and almost everywhere bounded 
function F(t): R—C whose Fourier coefficients are the given numbers 
do, 1, @2,...; the essential bound on F(t) is exactly the constant c in the 
preceding inequalities. 

Complex symmetric matrices do not seem to Occur in applications 
nearly as often as complex Hermitian (or real symmetric) matrices, but 
the preceding examples show that they do occur. Although a complex 
symmetric matrix need not be diagonalizable (see Problem 15 at the end 
of this section), there is a factorization for complex symmetric matrices 
that is analogous to the spectral theorem (4.1.5) for Hermitian matrices, 
and it can be proved in a logically similar way. We first prove an analog 
of Schur’s triangularization theorem (2.3.1) to show that a class 0 


204 Hermitian and symmetric matrices 

Op o wl 

VAD, = , wet"! AseM,-1, g=0  (4.4.3a) 
0 A, 

we see that 


g? ow! + w! A, 
0 A2A2 


The eigenvalues of AA (all nonnegative by assumption) are therefore g? 
together with the eigenvalues of A24. We conclude that the matrix 
A,€M,, obtained by this process of reduction also has the property 
that all the eigenvalues of AzA, are nonnegative. 

The process of reduction can now be repeated with A, and its suc- 
cessors at most n—1 times [just as in the proof of the Schur triangulariza- 
tion theorem (2.3.1)] to obtain 


(VTAV,) (VIAD,) = VAAN, =| 


Oj * 
Plae VAN Py 0,1 = n, =A 
0 o, 


where A is upper triangular with nonnegative main diagonal entries o;. If 
we set U = Vi V2 +- Vp—1, we have A = UAUT, as desired. O 


Exercise. Explicitly carry out the calculations in the proof of Theorem 
(4.4.3) for the matrix A = E ‘| and show that A = VAUT with 


aafo 2) ye Lf i 
jo o? ~ v2{i 1 
If n=2, not every matrix Ae M, has the property that AA has all 


. . 11. . 
nonnegative eigenvalues; A = [ | is a simple example. Thus, Theorem 


(4.4.3) is only a partial analog of Schur’s triangularization theorem 
(2.3.1). Every Ae M, can be triangularized by a transformation of the 
form A — UAU* with a unitary Ue M,, but only those matrices A eM, 
such that AA has nonnegative eigenvalues can be triangularized by a 
transformation of the form A > UAU” with a unitary Ue M,,. 

Every symmetric matrix A € M, has the property that all the eigenvalues 
of AA = AA* are nonnegative, however. The special form that Theorem 
(4.4.3) takes in this case is commonly attributed to Schur (1945), but 
earlier proofs were offered by Hua (1944), Siegel (1943), and Jacobsen 
(1939); historical priority must apparently be given to Takagi (1925). 


4.4.4 Corollary (Takagi’s factorization). If A e M, is symmetric (A= 
A’), then there exists a unitary Ue M, and a real nonnegative diagonal 


4.4 Complex symmetric matrices 205 


matrix E = diag(o;,...,0,) such that A= UEU". The columns of U are 
an orthonormal set of eigenvectors for AA, and the corresponding diag- 
onal entries of E are the nonnegative square roots of the corresponding 
eigenvalues of AA. 


Proof: If A= A’, then A = A* and AA = AA*. If x #0 is any eigenvector 
of the Hermitian matrix AA* and AA*x=)x, then x*\x=X(x*x) = 
X*AA*x = (A*x)* (A*x). Since y*y =0 for all ye C” with y*y=0 if and 
only if y=0, we see that \ = (A*x)*(A*x)/x*x 20. Thus, all the eigen- 
values of AA are nonnegative whenever A is symmetric. The theorem 
guarantees that there is a unitary Ue M, and an upper triangular A e M, 
with 


* 
A= n y 


On 


all 0,20 


such that A=UAU". But then UAU’=A=A'=UA'U’, so A=A’, 
which can happen only if A=¥ is a diagonal matrix, which is nonnega- 
tive by construction. Finally, AA = UDU’USU* =UL*U* is a unitary 
diagonalization of the Hermitian matrix AA, so the columns of U are 
eigenvectors of AA. O 


Any matrix of the form UAU” with A diagonal (not necessarily non- 
negative) is evidently symmetric, so in order that a given matrix A eM, 
can be factored as A= VAUT = UAU*=UAU™', with U unitary and A 
diagonal, it is necessary and sufficient that A be symmetric. Conditions 
under which A can be factored as A = SAT! with A diagonal and S non- 
singular (but not necessarily unitary) are given in Theorem (4.6.11). 

Every complex matrix A e M, can be written in the form A = VE W*, 
where V, We M, are unitary and L is a diagonal matrix with nonnegative 
main diagonal entries. This is the singular value decomposition, and it is 
discussed in Section (7.3). The diagonal entries of L are the singular 
values of A. The Takagi factorization A= UEU" for a (possibly com- 
plex) symmetric matrix is a special singular value decomposition for sym- 
metric matrices in which V= W. 

The construction used in the proof of Theorem (4.4.3) can be used to 
compute a Takagi factorization of a complex symmetric matrix. The 
matrix A produced will automatically be diagonal because of the sym- 
metry of A. See Problem 9 at the end of this section. 


Exercise. Explicitly carry out the calculations in the proof of Theorem 
(4.4.3) for the matrix A = [; ‘| and show that A = UAU” with 


206 Hermitian and symmetric matrices 
A= v2 0 and y-— | 1+Vv2 i 
0 v2 442v2| i 14Vv2 


Notice that A is automatically diagonal. 


Because the columns of the unitary factor U in the Takagi factoriza- 
tion A=USXU’ are eigenvectors of the Hermitian matrix AA, there may 
be a temptation to assume that if AA =U L7U* is a unitary diagonaliza- 
tion, then A=ULU T This need not be the case, as can be seen from 


consideration of the example A= [o al: Since AA =I, we have AA= 


QI?Q! for any real orthogonal 2-by-2 matrix Q, but QIQ” =I # A. The 
problem is that AA has an eigenvalue with multiplicity greater than 1, so 
an arbitrary eigenvector x of AA might not have the property that AX 

ax; this eigenvector might not give the desired reduction of A. If we con- 
sider the basis vector e}, then AAe, = Je, = le;, but Aé, = Ae, = e2; we-have 
case (b) in the proof of Theorem (4.4.3). According to the proof, we may 
take w= Aé@,+le, =e)+e, to obtain a vector v =v, = (e+ €2)/V2, which 
will reduce A. Since v, = (e,— e)/V2 is orthogonal to vı, we may take 


1/71 1 
Val, “| 
and obtain V'AV= lo m = lo | lo 0] = ED. Thus, if we set 


Ifi i 

v=o! i] 

A= UIU" is an appropriate factorization of A. Notice that a Takagi fac- 
torization (4.4.4) of a real symmetric matrix might not have real factoys 
The difficulty in the example just discussed, and in the general case 
arises from multiple eigenvalues of AA. If all the eigenvalues of AA are 
distinct, and if one uses the construction in the proof of (4.4.3) to com 
pute a Takagi factorization of the complex symmetric matrix A, one 
always has case (a). (See Problem 9.) In this case, each eigenvector x 0 
AA has the property that AX =ax for some ae C such that a =e" 


0 eR, and AAx = g°x. Thus, if AA = VYX? V* is a unitary diagonalization 


of the Hermitian matrix AA, we must have AP = VED?, where D’ = 


diag(e?™™, ...,e%®n); this identity can be used to compute the diagonal - 
entries of D? corresponding to nonzero diagonal entries of E once V and | 


E (the nonnegative square root of X?) are known. The entries of D? cor 


responding to zero entries of E are arbitrary and may be taken to be +1. | 
Finally, we have A=AVV" =VED?V! =(VD)Z(VD)"=ULU' if we | 
set U=VD and D=diag(e™!,...,e'"). We record these observations 


formally as the following corollary. 


4.4 Complex symmetric matrices 207 


4.4.5 Corollary. If A € M, is symmetric, if the eigenvalues of AA are 
distinct, and if AA = VE? V* is a unitary diagonalization of AA with © = 
diag(o,,...,0,) and all o;=0, then there exists a diagonal matrix D = 
diag(e’!, ..., etf») with all 0; e R such that A = UEU” with U=VD. The 
diagonal entries of the factor D corresponding to nonzero diagonal 
entries of E are determined by the relation AV = VID’; the diagonal 
entries of D corresponding to zero diagonal entries of X may be taken to 
be +1. 

If Ae M, is symmetric, and if we use (4.4.4) to write A=ULU’, 
then we can also write this as A =(UZ'*)(UL')’, where L = 
diag(+./o1, +o, ..., +Vo,). This observation constitutes a proof of 
the following corollary. 


4.4.6 Corollary. Let Ae M,,. Then A is symmetric if and only if there 
exists a matrix Se Mp such that A= SS T One may choose S = UD where 
U is unitary, D=diag(Jo;, Vo2,...,Vo,), and {o;} are the singular 
values of A, in which case rank S=rank A. 

Although a real symmetric matrix is normal, a nonreal complex sym- 
metric matrix need not be normal. If A= B+iC e M, with B and C real, 
then A is symmetric if and only if B and C are both real symmetric ma- 
trices. If A is both symmetric and normal, then 


AA* = (B*+C*)+i(CB—BC) =(B*+C*)+i(BC-—CB)=A*A 


from which it follows that B and C commute. In this event, B and C are 
simultaneously diagonalizable by a real orthogonal matrix Q. If B= 
QD,Q7 and C = QD,Q’ with D; and D; real diagonal matrices, then A = 
B+iC =QD,0"+iQD,Q' = Q(D,\+iD2)Q" = QAQ' with A=D,+iD). 


`- Conversely, if a matrix A € M, can be written as A= QAQ” with Qa real 


orthogonal matrix and A a diagonal matrix, then A =A’ and AA*= 


~QAO7OKQ™=Q|A’?Q"=QAQ'QAQ! = A*A, so A is both symmetric 


and normal. This proves the following theorem. 


4.4.7 Theorem. Let Ac M,,. Then A is both symmetric and normal if 
and only if there is a real orthogonal matrix Qe M,(R) and a diagonal 
matrix Ae M,, such that A= QAQ”. 

A useful example of a simple complex matrix that is both symmetric 
and normal is 


1 , 
S= 5 (I+iB) (4.4.8) 


where B is the “backward identity” matrix 


208 Hermitian and symmetric matrices 


which played a role in (3.2.3) in showing that every matrix is similar to its 


transpose. 
Since B? =], 


SS =4(1+iB)(—-iB) =; (1—-iB+iB+B’)=1 


and we see that S is both symmetric and unitary. 
Now consider a typical Jordan block J,(0) with zero main diagonal 


and k = 2, which we write in the form 


0 1 
0 
N= E l € Mk 
© 1 
0 0 
It is a simple computation to show that 
0 
I > 0 
BNB = - 
0 1 0 
0 0 
. 1 
BN = o, 
0 1 0 
1 0 
0. 
NB=|, T 
0 0 


Thus, N is unitarily similar to the matrix 


_ I 
SNS~'=SNS = 5 it iB) NU—iB) 


, | 
=> (N+ BNB) + (BN-NB) 


4.4 Complex symmetric matrices 209 


0 1 —1 0 
erof go.z 
== — ote +4 oo (4.4.8a) 
1 0 0 1 


© which is evidently symmetric. Any Jordan block J;,(\) with k = 2 is of the 
form +N, and SJ,(\)S7! = S(AI+.N)S~!=)I+SNS~ is symmetric 
` since SNS~! is symmetric. 


Every matrix A eM, is similar to a Jordan canonical form J of the 


- form (3.1.14) with e =2, and J = Jn, Oi, 2)@ ++ On, (Ax, 2) is a direct sum 
of modified Jordan blocks J,,,(\;, 2). This observation (equivalent to re- 
` placing N by 2N in the preceding argument) permits us to drop the coeffi- 
- cient factors of 4 in (4.4.8a). If we let Sp, = (1/V2)(1+iB) € My, be the n;- 
- by-n,; matrix of the form (4.4.8) if n;2=2 and S,;=[1], and if we set T= 


Sn,®++*®Sn,, then the preceding argument shows that 
TIT = TIT = (Sn In, ts 2)Šn, )® ore D (Sa, In ks 2)Sn,) 


s a direct sum of symmetric matrices and is therefore symmetric. The 
matrix T is unitary since each S,, is unitary, so we have shown that every 
matrix in Jordan canonical form is unitarily equivalent to a symmetric 


: matrix. Since every matrix is similar to a Jordan matrix, we have proved 


Sp (A) = SI, 2)S=dhI+ SNS 


0 1 —-1 0 
10: 0 0 S l 
=) + n +i oe eM, 
0 e . ] — 1 . 0 
1 0 0 l 


\-i 1 
SiQ)=[\] and 5.00)=| i vi | 


Since this form was derived from the Jordan canonical form, its unique- 


; ness is the same as that of the Jordan canonical form. 


210 Hermitian and symmetric matrices 


One of the consequences of this result is that there is nothing special 
about the spectrum, Jordan blocks, minimal polynomial, characteristi 
polynomial, or invariant factors of a symmetric complex matrix. Any 0 
these quantities can occur for a symmetric matrix ofa given size t can 
occur for a general complex matrix of the same size. Every siml arit 
class in M,, contains a symmetric matrix, every linear transformation o! 
C” has a symmetric basis representation, and symmetry of a maty 
just an artifact of the particular basis chosen to represent the underlyin 


Le et 
linear transformation. Another consequence 1s that every matrix is “diag-" 
onalizable” in a certain sense. 


i i ingular matrix $ 
4.4.10 Corollary. Let A € M, be given. There is a nonsingu atrix 
and a unitary matrix U such that (US)A(US )~ is a diagonal matrix with 


nonnegative diagonal entries. 


Proof: Use (4.4.9) to find a nonsingular S € M, such that SAST! is sym: 
metric and then use (4.4.4) to find a unitary Ue M, such that U( SAS~*)U 


is diagonal and nonnegative. CI 


Theorem (4.4.9) also implies that every complex matrix is similar to 
its transpose and can be written as a product of two complex symmetric 
matrices. Both of these results are true for matrices over any field, but 
Theorem (4.4.9) is not true for general fields. 


4.4.11 Corollary. Let Ae M, be given. There are matrices B, Ce M, 
such that B=B?, C=C’, and A=BC. Either B or C may be chosen to 


be nonsingular. 


i = -1 where E = E7 and S is non 
Proof: Use the theorem to write A=SES7, W = J 
singular. Then A = (SES") (S7)-1§ 7! = (SES™)(SS7)~' =BC, where B 


SEST and C=SS" are both symmetric. Since A= (SS7)(S7!)TES™! as 


well, either factor B or C may be chosen to be nonsingular. o 


The Gram-Schmidt process (0.6.4) has many applications in the study 
of normal matrices. There is an analogous process that is useful in the 
study of complex symmetric matrices. 


i i . yist 
4.4.12 Lemma. Let x, ---, Xg EC” be given vectors with ksn Py ob 
vectOrs Yi, -++ Y such that Span{x),.--,Xx3 =Span{ yi, s Ykbh Yi Yj 


4.4 Complex symmetric matrices 211 


for all i, j=1,2,..., k with i#/, yf yi=1 for i=1,2,...,r, and y/ y,=0 
for i=r+l,...,k, where r=rank XTX and X=[x,---x,]eM,,x is the 
matrix whose columns are the given vectors {x;}. 


Proof: Because the matrix X TX is symmetric, Takagi’s factorization 
theorem (4.4.4) permits us to write XTX =ULU", where Ue M; is uni- 
tary and E =diag(o,...,0,) with o, 2022 +e 20,>0,4,;=0=-:: = 0k, 
and rank XTX =r. If we set D=diag(Jo;,..., Vo,,1,...,l)eM,, and 
write J, = diag(1,...,1,0,...,.0)6€M, with r l’s and k—r 0’s, then 
XTX =(UD)I,(UD)' =S'I,S, where S=DU" is nonsingular. Thus, 
(XS')7(XS')=I,, so if we set XS7'=Y=[y,-+-¥e]EM, x, the 
column vectors y;,...,.¥, have the asserted properties since Y’Yy=/,. O 


The preceding lemma states a principle that is formally similar to the 
Gram-Schmidt process, which involves X*X instead of XTX. In the 
Gram-Schmidt process, however, each y; can be formed as a linear com- 
bination of x;, ..., X; for each j =1,2,...,k and that may not be possible 
here. Another difference is that the number of vectors y; for which y7 y; =1 
in the Gram-Schmidt process is equal to rank X (the maximum number 
of independent vectors x;), which is always equal to rank X*X. In this 
case, however, the number of vectors y; for which y/ y;=1 is equal to 
rank XX, which can be less than rank X. 


Example. Consider k=1 and x,=X =|;|. Then X’X=0, so 0= 


rank XTX, which is strictly less than rank X =1. The only possibility for 
yi is a scalar multiple of xı, so it is not possible to choose y, so that 
Span{xı} = Span{yı} and y/ yı =1. 


Example. Consider k=2 and X= [xı x2]= |; ||. Then rank XX =2 


and there exist vectors yı, y2 such that Span{y,, y2} =Span{x,, x2} and 
yiyi =1 =y y. Since xï xı =0, it is not possible to choose yı to be a 
scalar multiple of xı. 


The immediate application we have in mind is to the special situation 
of diagonalizable complex symmetric matrices. If A=A"eM, and if 
A=SAS™ for a diagonal A e M, and a nonsingular S € M,, then it is not 
evident from this usual diagonalization representation that A is sym- 
metric. If S is a complex orthogonal matrix, however, then S “l= S7 and 
A=SAS7!= SAS" is evidently symmetric. The next theorem says that it 
is always possible to choose S to be complex orthogonal. 


4.4.13 Theorem. Let A e M, be symmetric. Then A is diagonalizable if 
and only if it is complex orthogonally diagonalizable, that is, A= SAS =! 


212 Hermitian and symmetric matrices 4.4 Complex symmetric matrices 213 
for a diagonal A € M, and a nonsingular Se M,, if and only if A= QAQ T 0 o0 , 
where Qe M, satisfies Q'Q =I. U AU= 0 A! A'eMn-x 


with A’ nonsingular and symmetric. Thus, without loss of generality, we 
may assume that A is nonsingular. Let A= B+iC with B,C real and let 
z=x+iy eC” with x,y eR”. Let F= [2 S| = [>] eR”. (a) B, C, 
and F'are real symmetric matrices. Discuss the relationship between Az = 
(B+iC)(x+iy) and FZ. (b) F is nonsingular. Hint: If FZ =0, what is 
Az? (c) If F| >] =| _*], then F|} ] =—d{ 2]. The nonzero eigenvalues 
of F can be paired with one positive and one negative. (d) Let the or- 
thonormal eigenvectors of F corresponding to the positive eigenvalues 
My ++ +s Àn be denoted Z; = | 3 | e R”, /=1,2,...,0, let X = [x1 + xy], 
Y=[y; yn] E Mn, and let E =diag(),,...,,)€M,,. The spectral theo- 
rem for real symmetric matrices says that F=VAVT, where 


DESETE 


Proof: Suppose A=A" and let x, yeC" be eigenvectors of A wit 
Ax =)x and Ay = puy. If \#¥p, then y 'Ax=y Ax=)y"x and y'Ax 
(Ay)"x=(uy)"x=py"x. Thus \y’x=py"x and y’x=0 since \#p 
This is just an application of the principle of biorthogonality (1.4.7) t 
symmetric matrices. If A is diagonalizable and A= SAS~', there is n 
loss of generality to assume that the like eigenvalues of A are groupe 
together in A=A,;@®--:@Ag with A=; eMn, mt tnan an 
\; #\; if i # jJ. Partition the columns of S=[s, + sp] =[S, S2 --- Sy] con 
formally with A=A,@®---@®Ag, so that S;eM,,,, for i=1,2,...,d 
Because of the biorthogonality property, S/S;=0€Mn,,n if ixs, an 
Sř S; is nonsingular for all i=1, 2, ..., d because S’S is nonsingular an 
block diagonal. Since each matrix S/S; has full rank, Lemma (4.4.12 
says that the columns of each S; can be replaced by new columns, whic 
are a nonsingular linear combination of the old ones and are mutual 
complex orthogonal; that is, there exists a nonsingular R; € M,,, such tha 
Qi =5;R; satisfies Q/ Q; = R7 S7 S;R; =I € M,,. Since QQ; = R/S/S)R;= 
0 for all i147 and AQ;=AS;R;=;S;R;=r,Q; for i=1,2,...,d, th 
matrix Q =[Q; --- Qa] € M, is complex orthogonal and A = QAQ". 0 


panes 


-Y X 0 -E 


and V is a real orthogonal matrix (why?). Let U = X — iY. Show that U is 
unitary and that UEUT =A. 


3. What does (4.4.4) say when A is a real symmetric matrix? How is it 
related to the usual spectral decomposition of a real symmetric matrix? 
Hint: If A= QAQ" with A a real diagonal matrix and Q a real orthog- 
onal matrix, write A = E D? and let U = QD. When can all the factors in 
the Takagi factorization A = UEUT be taken to be real? 


The preceding result provides an interesting setting for Theorem 
(4.4.7): A symmetric matrix A is diagonalizable if and only if A= QAQ 
with Q complex orthogonal, and it is normal if and only if QO can be 
chosen to be real. 

The result in Theorem (4.4.13) can be generalized somewhat. If A, Be 
M,, are symmetric matrices, then A and B are similar if and only if they 
are similar via a complex orthogonal similarity. In fact, this is true under 
the weaker hypothesis that there is one polynomial p(t) such that A’ = 
p(A) and BT = p(B). See [HJ]. 


4. If A=ULU' eM, with U and E as in (4.4.4), show by direct com- 
putation that øf are the eigenvalues of AA and AA, and that AA and AA 
are Hermitian. Show that the columns u; of U and the numbers ø; satisfy 
the equations Ai; = 0;u;, i=1,2,...,. Perhaps for this reason, the g; are 
sometimes called generalized eigenvalues, but the term singular values 


Problems seems to be more common. 


1. Suppose A e M, is symmetric and A = B+ iC with B, C e M, both real. 
Show that A is normal if and only if B and C commute. Show that A is 
normal if and only if AA is real. Show that A is normal if and only if A and 
A commute. Give an example of a symmetric matrix that is not normal. 


5. Let Ae M, be symmetric, let E and U be as in (4.4.4), and arrange the 
singular values of A in nonincreasing order 0; = a) => ++ > o, = 0. (a) Mod- 
fy the proof of the Rayleigh-Ritz theorem (4.2.2) to show that dmax = 
g; = max{|x"Ax|/x*x:0#x eC"), that is, there is a complex symmet- 
ic analog of the upper bound in (4.2.2). Consider the first column of 
U to show that the extremum is obtained for a unit vector x such that 


AX = 0,x. (b) Consider A =I eM) and x= [ i | to show that opin = 0, # 
~min{|x"Ax|/x*x: 0 #x e C") in this case, so a complex symmetric analog 


2. Provide the details for the following outline for a different proof of 
Corollary (4.4.4). The notation and hypotheses are as in (4.4.4). If A is 
singular, let {u,,...,4,} be an orthonormal basis of the null space of A 
and let VU=[u,... uy Uki- Ha] EM, be unitary. Then 


4.4 Complex symmetric matrices 215 


214 Hermitian and symmetric matrices 


9. Using the notation of the proof of (4.4.3), show the following: (i) If 
` àis a simple eigenvalue of AA and x #0 is such that AAx =x, then 
: possibility (a) is always the case. Hint: Let o = +V) and set w = AX — 0x. 
- Show that AW = —ow and AAw= dw, so wis a scalar multiple of x. (ii) If 
`- A=A', then PAT, =[0]@ Ap; that is, the row vector w” in (4.4.3a) is 
zero. Use this to show that the construction automatically produces a 
` matrix Vj e VTAN D ---V,,_; = U*AU =A that is diagonal. 


of the lower bound in (4.2.2) is false. (c) Consider A =1 e€ M3, w= [ i] ; 
and show that max{|x7Ax|/x*x:0#xeC", x Lw}=0. Conclude that a 
complex symmetric matrix/singular value analog of the Courant-Fischer - 
min-max formula (4.2.12) is false for k>1. However, see (7.3.10). (d) - 
What about a symmetric analog of the max-min formula (4.2.13)? (e) Let - 
A= | ; i] (6, = & = V2) and form A = [1] (o; = 1) by deleting the last row - 
and column from A. Note that the interlacing inequality 6, = o; = ë, anal- 
ogous to (4.3.9) is not correct. (f) Nevertheless, there are inequalities for - 
the singular values of bordered symmetric matrices. Let A € Mp +; be sym- - 
metric and have singular values 6;=--- 26,4; and form Ae M, (with f 
singular values o, = +-+: = op) by deleting a row and corresponding column - 
from A. Use Theorem (7.3.9) to show that õp = 0,2 G42, KSl, N 
(G42 =0). Verify these inequalities for the example in (e) and compare - 
them with the interlacing inequalities (4.3.9) for eigenvalues of bordere 
Hermitian matrices. 


6. If Ae M, is symmetric and if A = UEU” with U unitary and E 
diag (01, 02, ---» on) with all o; = 0, show that the rank of A is equal to th 
number of nonzero o; terms. Hint: If B,C eM, are nonsingular, the 
rank A=rank BAC. 


7. Let A = B+ iC e M, with B, C real, and let F=[2 _5| © Man. ( 
Show that AA = B? + C? + i(BC — CB) and 
P= B?+ C? R 
~ | _ Bcc B*+C? 
(b) Show that S= (1/v2)[ _} ~Ë | e Man is unitary. (c) Show that SF°S* 


[5 aal (d) Conclude that the squares of the eigenvalues of F are t 


same as the eigenvalues of AA, together with their complex conjugates. 
(e) If A is a complex symmetric matrix, show that F is a real symmetric. 
matrix with real eigenvalues, that F has only nonnegative eigenvalues, 
and that the set of squares of the eigenvalues of F is the same as the set of- 
eigenvalues of the Hermitian matrix AA. 


10. Let Ae M, and suppose there is a nonsingular Se M, such that A = 
= SAST', where A =diag(\y,...,d,,). Show that AA is diagonalizable and 
has only nonnegative eigenvalues, and that rank A=rank AA. What 


-does this have to do with (4.4.4)? Show that neither [6 4] nor [} ~; ] can 
- be written in this form. 


: 11. If Se M, is given, show that rank STS < rank S in general and that 
- rank S'S <rank S is possible. What happens if S is real? Hint: Consider 


: 10 

S= [; o]: 

2. If Ae M, is a complex symmetric matrix and if x, y e C” are eigen- 
ectors of A corresponding to distinct eigenvalues of A, show that 


x’y=0. Does this mean that x and y are orthogonal? Hint: Consider 
"(Ay) = (Ax)"y. 


: B. If Ae M,, is symmetric and has n distinct eigenvalues, show directly 
` that there exists a nonsingular matrix Se M, and a diagonal matrix D 
- such that A = SDS". Hint: A is diagonalizable, so A= SAS~'and AS = 
_ SA. By Problem 12, STS =D is diagonal. Thus STAS = S’SA = DA and 
` A=(S57!)" (DA)S~'. What adjustments need to be made to show that 
~A=QAQ' with a complex orthogonal Q? 


14. If AeM, is symmetric and nonsingular, show that A~! is symmetric. 


15. A real symmetric matrix is Hermitian and therefore is diagonal- 
_izable. Show that a complex symmetric matrix need not be diagonal- 


_ izable. Hint: Consider A = |; _|| and compute A’. 


i- 

8. Let Ae M, be a complex symmetric matrix. Consider the quadratic 

form q4(x,x) =xTAx and the bilinear form b4(x, y) =x Ay generated: 

by A. Use Corollary (4.4.4) to show that 

sup |q4(x,x)|= sup |b4(x,Y)|=%max(A) 
x*x=1 


x*x=1 


16. Let Ae M,. Show that A is both symmetric and unitary if and only 
if A can be written as A= QAQ", where Qe M,(R) is a real orthogonal 
matrix and A=diag()j,...,\,) =diag(e”!,...,e/") with |\,|=1 and 
6 ER for k=1,2,...,n. 


17. Use Problem 16 to show that a matrix Ue M, is unitary and sym- 
` metric if and only if there is a unitary matrix Ve M, such that U = VVT., 


yy =l 
where oax (A) is the largest eigenvalue of AA. 


216 Hermitian and symmetric matrices 4.4 Complex symmetric matrices 217 


Let FCM, be a given family of diagonalizable symmetric matrices. 
There exists a complex orthogonal matrix Q such that QAQ” is diagonal 
for all A e F if and only if ¥ is a commuting family. 


18. We have shown that every matrix A €M, is similar to a symmet- 
ric matrix. Is every matrix similar to a Hermitian matrix? To a normal 


matrix? 


19. Use (4.4.9) to show that every matrix is similar to its transpose. 25. Use the argument in the proof of Theorem (4.4.7) to show that 
a matrix Ae M, is both skew-symmetric (A= —A’) and normal if 
and only if there is a real orthogonal matrix Qe M,(R) such that 
Q'AO=0@0@ 00940A: Ar, where each A; € Mz has the 


form 


20. Show that Theorem (4.4.9) is not true over the real field; that iS, 
show that not every matrix A€ M,„(R) is similar to a real symmetric 
matrix. 


21. A complex symmetric matrix A can have an isotropic vector v as an. 
eigenvector; that is, Av=)v, v #0, and vv =0. If A is diagonalizable, 
however, show that \ cannot be a simple eigenvalue. Hint: Write A= 
SAS7! with v the first column of S and argue that S TS is singular because 
its first row is zero. In particular, if v e C” is any vector such that vv 
0, the symmetric (rank 1) matrix A= vv” cannot be diagonalizable. See 
Problem 15. 


22. Provide the details for the following outline for another proof of 
Corollary (4.4.4). The notation and hypotheses are as in (4.4.4). This is 
essentially Siegel’s (1943) proof. (a) AA is Hermitian, so there is a unitary 
VeM, and areal diagonal A, € M, such that AA = VA, V*. (b) V*AV =B 
is both symmetric and normal, so by (4.4.7) there is a diagonal A €e M, 
and a real orthogonal matrix Qe M,(R) such that B= QAQ". (9 A 
(VQ) A(VQ)". Now write A= ELE’ with E, © both diagonal and X non- 
negative to get A= ULU’ with U=VQE unitary. 


23. Let z=([21, 22, ...,Zn]” be a vector of n complex variables and let 
f(z) be a complex analytic function of n complex variables in some 
domain DC C”. Because of the equality of the mixed partial derivatives, 
H=(07f/dz; dz;] is symmetric at every point ze D. The discussion in 
(4.0.3) shows that one may assume that the coefficient matrix A = [a;;}] in 
the general linear partial differential operator 

2 


n ð f 
Lf= maul) 0Z; OZ; 


0 ; 
a=] Al zeC, f=l,2,.,k (4.4.14) 


Hint: Consider the real and imaginary parts of A and use Theorem 
(2.5.15). When are the 1-by-1 zero direct summands absent? 


26. Use Problem 25 and the argument in Problem 22 to prove a complex 
skew-symmetric analog of Takagi’s factorization (4.4.4) for a complex 
symmetric matrix: A matrix A € M, is skew-symmetric (A = —A’) if and 
only if there is a unitary Ue M, such that 


A=U(0@---®0OAI@ DAUT 


where each A; e M, has the form (4.4.14). In particular, conclude that a 
skew-symmetric complex matrix must have even rank. 


27. Let WeM, be a given unitary matrix. Show that there is a uni- 
tary VeM, such that V?=W and V'’A=AV whenever AeM,, is 
such that WTA =AW. Hint: If W=UAU* with U unitary and A= 
diag(e!!, ..., efn) with 0<0;<2x, consider the natural square root 
A? = diag(e!"'/?, ...,e"%"/?) and let V=UA'?U*. Show that WTA = 
AW if and only if A commutes with UTAU. Use the argument in the 
proof of (1.3.12), or show that V is a polynomial in W, and conclude that 
A’? commutes with UTAU and hence VTA = AV. 


28. Provide the details for the following outline of yet another proof of 
Corollary (4.4.4). The notation and hypotheses are as in (4.4.4). This is 
essentially Hua’s (1944) proof. Assume first that A is nonsingular. (a) AA 
is Hermitian and positive definite (x*AAx = (Ax)*(Ax) = 0 for all xe 
C”), so there is a unitary Ze M, and a nonnegative nonsingular diago- 
nal E e M, such that AA = ZX? Z*. (b) W=¥7!Z*AZ is unitary and SW 
is symmetric, so LW=W7Y. (c) Use Problem 27 to show that there is 
a unitary Ve M, such that V? = W and EV = VTE. (d) Z*AŽ = ĘW = 
EV? =(EV)V =V EV, so A=(ZV)E(ZVT)". Let U= ZV". (e) If A is 
singular, employ the argument at the beginning of Problem 2 to reduce to 
the nonsingular case. 


is symmetric. Show that at each point Zo € D there is a unitary change of 
variables z > U¢ such that in the new coordinates Lf is diagonal at Zp, 
that is, 
n a 

Lf= ¥ oj» 6, 2022 °°: 20,20 at Z=Zo 
izi OS] 
24. Use (4.4.13) and an induction argument like the one used in the 
proof of (1.3.19) to prove the following analog of Theorem (4.1.6) on 


simultaneous unitary diagonalization of a family of Hermitian matrices: 


218 Hermitian and symmetric matrices 


Further Readings and Notes. For the original versions of Corollary (4.4.4) 
see T. Takagi, “On an Algebraic Problem Related to an Analytic Theoren 
of Caratheodory and Fejer and on an Allied Theorem of Landau,” Japan. 
J. Math. 1 (1925), 83-93, as well as I. Schur, “Ein Satz über Quadratische 
Formen mit Komplexen Koeffizienten,” Amer. J. Math. 67 (1945), 472- 
480. Other proofs were given by C. L. Siegel, “Symplectic Geometry,” 
Amer. J. Math. 65 (1943), lemma 1, pp. 12, 14-15; L.-K. Hua, “On the . 
Theory of Automorphic Functions of a Matrix Variable I - Geometric 
Basis,” Amer. J. Math. 66 (1944), 470-488; and N. Jacobson, “Normal | 
Semi-Linear Transformations,” Amer. J. Math. 61 (1939), 45-58. The i 
proof of (4.4.4) via the triangular reduction (4.4.3) is in Y. P. Hong and | 
R. A. Horn, “On the Reduction of a Matrix to Triangular or Diago- | 
nal Form by Consimilarity,” SIAM J. Algebraic and Discrete Methods 7 — 
(1986), 80-88. For generalizations of (4.4.11) to an arbitrary field, see ` 
O. Taussky, “The Role of Symmetric Matrices in the Study of General - 
Matrices,” Linear Algebra Appl. 5 (1972), 147-154. 


4.5 Congruence and simultaneous diagonalization of Hermitian 
and symmetric matrices 


Any real second-order linear partial differential operator L can be written ` 
in the form 
8? f(x) 


n 
Lf= $ ajj(x)=—— +lower-order terms, 


Ox; Ox; x bale 


(4.5.1) 


i,j=l 


where we assume that the coefficients a;;(x) are defined on some domain 
DCR" and that the function f is twice continuously differentiable on D. ` 
As we saw in (4.0.3), we may assume without loss of generality that the 
matrix of coefficients A(x) =[a;;(x)] is a real symmetric matrix for all . 
xeD. By “lower-order terms” we mean terms involving f and its first- 
partial derivatives only. i 

If we make a nonsingular change of independent variables to new 
variables s =[s;]}e DCR", then each s; = s;[x] =5)(x1,---,%n), and non- 
singularity means that the Jacobian matrix 


OS; (Xx 
S(x) = | eM, 

Ox; 
is nonsingular at each point of D. This assumption guarantees that the. 
inverse change of variables x = x(s) exists locally. It is a straightforwa 
application of the chain rule to show that, in these new coordinates, the . 


operator L has the form 


4.5 Congruence and simultaneous diagonalization 219 


n no as; ds; | Of 
Lf= 3 | hg h - 
nail get OX? ax, | As; as; + lower-order m s> 


n 2 


lpn. 4. 
j=l ðS; OS; 


Thus, the new matrix of coefficients B (in the coordinates s =[s;]) is 
related to the old matrix of real coefficients A (in the coordinates x = 
[x;]) by the relation 


B= SAS" 


+ lower-order terms 


(4.5.37) 


where S is a real nonsingular matrix. 

If the differential operator L is associated with some physical law 
(e.g., the Laplacian L = V? and electrostatic potentials), the choice of 
coordinates for the independent variable should not affect the law 
although it obviously affects the form of L. Thus, we are led to ask what 
the invariants are of the set of all matrices B that are related to a given 
matrix A by the relation (4.5.37). 

Another example of a transformation like (4.5.37) comes from prob- 
ability and statistics. Suppose that X,,X>,...,X, are real or com- 
plex random variables with finite second moments on some probability 
space with expectation operator E, and let m; = E(X;) denote the respec- 
tive means. The Hermitian matrix A = [a;;] =(E[(X;—4;)(X;—j7;)]) = 
Cov(X) is the covariance matrix of the random vector X =[Xj,..., X, 17. 
If S=[s;;]e€M, is a given matrix, SX is a random vector whose com- 
ponents are linear combinations of the components of X. The means of 


` the components of SX are 


i A n 
E(SX))=E( $ saXe)= X SKxE(X) = È Six we 
z k=l k=l 
and the covariance matrix of SX is 


Cov(SX) = (E[((SX);— ESX) DSX) -E((SX );))]) 


P(E |( Reet) 2544-0) ) 


=( 5 sp ELX tp) (Xai) 8) =( 5 SipdpaSia) 
pq=l P 


qa 
= SAS* 
This shows that 


Cov(SX) = S Cov(X)S* (4.5.3*) 


d tric matrices 4.5 Congruence and simultaneous diagonalization 224 
220 Hermitian and symme 


matrix of a random vector transforms according to 


may wish to consider either *congruence or ‘congruence, depending on 
T . 
4.5.3"), but it reduces to (4.5.3 ) if the 


the context. Both types of congruence share an important property with 
= similarity. 


Thus, the covariance 
a slightly different law from ( 


matrix S is real. l 
As a final example, consider the general quadratic form 


4.5.5 Theorem. Both *congruence and ‘congruence are equivalence 


n n 
Qax) = Ð ajXixj =x"Ax, x=[x]eC relations. That is, for any Ae M,, 


i, j=l 
we (a) A is congruent to A. 
itian form ` , 
and the Herm (b) If A is congruent to B, then B is congruent to A. 


(c) If A is congruent to B and B is congruent to C, then A is con- 
gruent to C. 


Hg(x)= $ biyjXixj=x*Bx, x=[x,JeC" 
i,j=l 
where A = [a;;] and B= [b]. If SeM, is a given matrix, then 


Qa (Sx) = (Sx)TA(Sx) =x" (S"AS)x = Ostas(®) 
H,(Sx) = (Sx)*B(SX) = x*(S*BS)x = Hys+ps(X) 


In this example it does not matter whether A, B, S, and x are real or com 
plex. There are two slightly different laws of transformation a 
here, and this is the reason for the following definition. 


Proof: For (a), write A =JAI*. If A=SBS* and S is nonsingular, 
then B=S~'A(S7')*, Finally, if A =S,BSř and B=S)CS3, then A= 
(51852) C(S, S2)*. The proof for "congruence is formally the same. O 


The set of all n-by-n matrices is therefore partitioned into equivalence 
classes by congruence. As an abstract problem, we may seek a canonical 
representative of each equivalence class under each type of congruence. 
This problem is more complicated for *congruence, so we take up this 
case first. 

The practical problem of understanding and classifying differential 
operators by identifying the invariants of the congruence relation leads 
us to the problem of identifying a canonical representative of the equiva- 


4.5.4 Definition. Let A, Be Mn be given. If there exists a nonsingula 
matrix S such that 
(a) B=SAS*, then B is said to be *congruent (“star-congruent 


lence class of real symmetric matrices that are congruent (via a real 

to A. . T, “tee-congruent’ . . . . . 
(b) B= SAS", then B is said to be congr uent (“tee-cong matrix S) to a given matrix. It turns out that this problem has a simple 
to A solution: Just count the number of positive, negative, and zero eigen- 


h tions of congruence must be closely related; . values. For this reason, we introduce the following terminology: 
these two noti | 


+ t ` ; 
It is clear tha When it is not important to dis- 


i i | matrix. 
they are the same if S is a rea 
tinguish between the two notions, we use the term congruence without a 


prefix. Some authors use the term conjunctive for *congruent, but M 
have chosen to deviate from this usage to have terms with greater m 


monic content. 


4.5.6 Definition. Let A €e M, be a Hermitian matrix. The inertia of A is 
the ordered triple 


1(A) = (i (A), i- (A), i0(A)) 


where i4 (4) is the number of positive eigenvalues of A, i_(A) is the 
number of negative eigenvalues of A, and ig(A) is the number of zero 
eigenvalues of A, all counting multiplicity. Notice that the rank of A is 
equal to i,(A)+i_(A). The signature of A is the quantity i} (4)—i_ (A). 


Exercise. Show that congruent matrices have the same rank. 


f Ais Hermitian, then so is SAS* (even if S is singular); if 
is also symmetric. Usually, one ts interested in 
ix, * er- 
congruences that preserve the type of the matrix, congruence ror i 
iti i T for symmetric matrices. If A is real an 
mitian matrices and “congruence for sy paren 
symmetric, however, then it is both symmetric and Hermi i 7 : ih 
then Hermitian and SAS T işs symmetric. For a real symmetric matrix, 


Notice that i 1 
Ais symmetric, then SAS 


Exercise. Show that the inertia of a Hermitian matrix A € M, is uniquely 
determined if one knows both the signature and the rank of A, and 
conversely. 


222 Hermitian and symmetric matrices 

If AeM, is a given Hermitian matrix, then A=UAU* with A= 
diag();, ---, An) and U unitary. It is convenient to assume that the positive 
eigenvalues occur first among the diagonal entries of A, then the nega- 
tive eigenvalues, and then the zero eigenvalues. Thus `, A2,.--, Ai4 > 9, 
Nh Ng ae <0 and Ne raa = hh, = 0. If we set 


D=diag(+V%1, sea FV Ait 2 tJ Ki, ties +y “hi, +i he1) 


then D is a nonsingular real diagonal matrix, and 


1 


É 0- 


é bd "3 bh " 
where the indicated matrix has exactly i,(A) “+r terms, i_(A) “=1 
terms, and io(A) “0” terms. Thus, the matrix A can be written as 


[ 


ALUNO ES S* = SI(A)S* 


(4.5.7) 


where S=UD is a nonsingular matrix, and /(A) is the inertia matrix 
of A. Thus, every Hermitian matrix is *congruent toa diagonal matrix of 
very simple form that is known once the inertia of the matrix is known. It 


would be attractive to use the inertia matrix as the canonical representa- , 


tive of the equivalence class of matrices that are *congruent to A, but in 
order to do so we must be certain that *congruent Hermitian matrices 
have the same inertia. This is the content of the following theorem, 
which is usually known as Sylvester's law of inertia. 


4.5 Congruence and simultaneous diagonalization 223 


4.5.8 Theorem. Let A, Be M, be Hermitian matrices. There is a non- 
singular matrix S e M, such that A = SBS* if and only if A and B have the 
same inertia, that is, the same number of positive, negative, and zero 
eigenvalues. 


Proof: If A and B have the same inertia, then each can be represented in 
the form (4.5.7) with a possibly different S for each, but with the same 
inertia matrix. Since the *congruence relation is transitive, and since A 
and B are *congruent to the same matrix, they are *congruent to each 
other. It is the converse that is more interesting. 

Suppose A and B are *congruent and that A = SBS* for some nonsingu- 
lar matrix Se M,,. Since congruent matrices have the same rank, i9(A) = 
io(B) and we need only show that i4 (4) =i} (B). Let vi, v3, ..., Uj, (A) be 
orthonormal eigenvectors of A corresponding to the positive eigenvalues 
\j(A),...,A7,(4)(A), and let S,(A) =Span{v,,...,v;,(4)). The dimen- 
sion of S,(A) is i,(A), and if x=a,v,+--- + aj,(4)U0i,(4) #9, then 
X*Ax =M (A) [oy |? + ee +AA) laial? >0. But then 


x*SBS*x = (S*x)*B(S*x) >0 


so y“By > 0 for all nonzero vectors y in Span{S*u,..., S*v;,(4)}, which 
has dimension i,(A). By (4.3.23), we must have 1,(B)2=i,(A). But 
since the roles of A and B in this argument can be reversed, we conclude 
that i,(B)=i,(A). O 


Exercise. Let Ae M, be Hermitian. Show that A is *congruent to the 
identity matrix if and only if all the eigenvalues of A are positive. 


Exercise. Let A, B e M, be real symmetric matrices. Show that A and B 
are *congruent via a complex matrix if and only if they are congruent via 


- areal matrix. 


Exercise. Let A, Be M, be real symmetric matrices. Show that A and B 
are congruent via a real matrix if and only if A and B have the same 
inertia. 


Exercise. How many disjoint equivalence classes under *congruence are 
there in the set of n-by-n complex Hermitian matrices? In the set of n-by- 


_ n real symmetric matrices? 


Sylvester’s theorem settles completely the question of choosing a 


- representative element from each eigenvalue class of Hermitian matrices 


under *congruence by guaranteeing that the signs of the eigenvalues of a 


~ Hermitian matrix do not change under *congruence. But how do the 


magnitudes of the eigenvalues change under *congruence? Using the 


224 Hermitian and symmetric matrices 


simplest form of Weyl’s theorem (4.3.1), we can give a quantitative form 
of Sylvester’s theorem. 


4.5.9 Theorem (Ostrowski). Let A, Se M, with A Hermitian and S 
nonsingular. Let the eigenvalues of A and SS* be arranged in increasing 
order (4.2.1). For each k = 1, 2,..., there exists a positive real number 6; 
such that \;(SS*) < 6; S,(SS*) and 


Ag (SAS*) = 044 (A) (4.5.10) 


Proof: First observe that if SS*x =x and x #0, then \=x*SS*x/x*x = 
(S*x)* (S*x)/x*x > 0, so all the eigenvalues of SS* are positive. Let k bea 
given integer, 1<k <n, and consider the Hermitian matrix A—),(A)J, 
whose kth eigenvalue is zero. By Sylvester’s theorem (4.5.8), the kth eigen- 
value of S(A —),(A)I)S* = SAS* — Ap (A) SS* is also zero. Weyl’s inequal- 
ity (4.3.2) says that the kth eigenvalue of SAS*—,(A)SS* has the bounds 


hp (SAS*) +4 (—dg (A) SS*) Sg (SAS*— dx (A) SS*) = 0 
<),(SAS*) + n(—dx(A)SS*) 
or that 
hp (SAS*) < —dy(—dg(A)SS*) = An (Az (A) SS*) 
= ee if (A) 20 
~ | Ag(A)d(SS*) if Ag(A) SO 
and 
Ng (SAS*) = —dy(—Ag(A)SS*) = Ai (Ag (A) SS") 
= S if d,(A)=0 
~ | AklA)An(SS*) if Ag(A)<0 


In either case [\,(A) 20 or \y(A) <0], these inequalities imply that 
Ar (SAS*) = 6,,(A) for some 6, with \,(SS*) <6; S\,(SS*). O 


If A=IeM, in Ostrowski’s theorem, then all \,(A)=1 and @= 
h,(SS*). If Se M, is unitary, then \,(SS*) =i,(SS*) =1 and all 6, =l; 
this expresses the invariance of the eigenvalues under a unitary similarity. 
Thus, the bounds for 6, given in the theorem are best possible for any 
given A as well as for any given nonsingular S. 

By a simple continuity argument, Ostrowski’s theorem can be extended 
to cover the situation in which S is singular. In this event, let e >0 and apply 
the theorem with S replaced by S +e to see that \,((St+e/)A(S+e/)*)= 
Orrel A) with MUS HEIMS +el)*) SO, Sn (S+ el) (S+l)*). Now let 


4.5 Congruence and simultaneous diagonalization 225 


e> 0 to obtain the bound 0 < 6; =i,,(SS*). This result may be thought of 
as an extension of Sylvester’s law of inertia to singular *congruences. 


4.5.11 Corollary. Let A, S €e M, and let A be Hermitian. Let the eigen- 
values of A and SS* be arranged in increasing order (4.2.1). For each k = 
1,2,..., there exists a nonnegative real number 6, such that \,(SS*) < 
6, <,(SS*) and 
Ax (SAS*) = 0k (A) 

In particular, the number of positive (negative) eigenvalues of SAS* is 
less than or equal to the number of positive (negative) eigenvalues of A. 

The problem of finding a canonical representative of the equivalence 


classes of complex symmetric matrices under congruence has an even 
simpler solution: Just compute the rank. 


4.5.12 Theorem. Let A,BeM, be (complex or real) symmetric ma- 
trices. There is a nonsingular matrix Se M, such that A= SBS’ if and 
only if A and B have the same rank. 


Proof: If A = SBS" with S nonsingular, then A has the same rank as B by 
(0.4.6). Conversely, use (4.4.4) to write 


A=U,L,U! = UIE DPU = (U D) I(E) UD)" 


where I(£;) is the inertia matrix (4.5.7) of E; and is determined solely by 
the rank of A, U; is unitary, E, =diag(o,, 02,...,0,) with all o,;=0, and 
D, = diag(d;, dz, ..., d,) with 


d= Jo, if a, >0 
' 1 if o=0 


Notice that D; is nonsingular. In the same way, we can also write B= 
(U,D>)I( ¥>)(U,D>)!" with similar definitions. If we assume that rank A = 
rank B, then (£) =/(2L>), and 


I(E) = (UD) ALU Di) J"! = (22) = (UD) BIU: D)" 
and hence 
A = (U1 Dı) (U2D2)7'B[(U, Di) (U2 D2) 7'1" 
We conclude that A and B are ‘congruent. O 
Exercise. How many disjoint equivalence classes under ‘congruence are 


there in the set of n-by-n complex symmetric matrices? In the set of n-by- 
nreal symmetric matrices? 


226 Hermitian and symmetric matrices 


Exercise. Let A € M, be symmetric. Show that there exists a nonsingular 
Se M, such that A= SS’ if and only if A is nonsingular. 


Exercise. Let A, Be M, be symmetric. Show that there exist two non- 
singular matrices X, Y €e M, such that A = XBY, that is, A and B are 
equivalent, if and only if there exists a nonsingular S e M, such that A 

SBS’, that is, A and B are ‘congruent. Hint: If A =XBY, how are the 


ranks of A and B related? 


The preceding result is an analog of Sylvester's law of inertia (4.5.8) 
for ‘congruence of complex matrices. The following result is an analog 
of Ostrowski’s quantitative version [(4.5.9) and (4.5.11)] of Sylvester’s 


theorem. 


4.5.13 Theorem. Let A, Se M, with A= A’, Let A= UEU" and 
SAS’ =VMV' be Takagi factorizations (4.4.4) of A and SAS" with U 
and V unitary, E =diag(oj, 02,...,0,), and M = diag (m, 12» wees An) wih 
all o;,#;20. Let \;(SS*) denote the eigenvalues of SS . Suppose n e 
numbers o;, p; and \;(SS*) are all arranged in increasing order (4.2.1 ). 
For each k=1,2,...,” there exists a nonnegative real number 6, with 
h(SS*) < 0k < dy (SS*) such that pg =0kok. If S is nonsingular, all 0, >0. 


T 

Proof: The numbers ue are the eigenvalues of BB*, where B = SAS”. 
Thus, o 
uk = dx(BB*) = Mk(SASTSAS*) = d,(S[ASSA]S*) = 8, Mk (AS SA) 
for some 6, with \(SS*) < Ôk < hn (SS*); we have used (4.5.11) to obtain 
the last equality. Since the eigenvalues of a product of two matrices are 
independent of the order of the product (1 3.20), we also have 

u2 = 6, Me (ASTSA) = Ôk Nk (SAAST) = Ôk Xx (SAAS*) 
because the eigenvalues \, are real. Applying (4.5.11) again, we obtain 

pk = 6,044 (AA) = 6,04 0% 
for some 6; with \,(SS*) < fk S\,(SS*). Thus, py = 
6, =~ 6,6, satisfies the required bounds. O 


We know from (1.3.19) that two diagonalizable matrices can be simul- 
taneously diagonalized by the same similarity if and only if they com- 
mute. What is the corresponding result for simultaneous diagonalization 
by congruence? l 

Perhaps the earliest motivation for results about simultaneous diag- 
onalization by congruence came from mechanics in the study of “small 


9.0, 04 = Oxor and 


4.5  Congruence and simultaneous diagonalization 227 


oscillations” about a stable equilibrium. If the configuration of a dy- 
namical system is specified by generalized (Lagrangian) coordinates 
qis q2» -++ qn in which the origin is a point of stable equilibrium, then 
near the origin the potential energy function V can be approximated by a 
real quadratic form 


n 
V= È aig; 


iLj=l 


in the generalized coordinates q;. The kinetic energy T can be approxi- 
mated by a real quadratic form 


A 
T= © bijdid; 
i,j=l 
in the generalized velocities g;. The behavior of the system is governed by 
Lagrange’s equations 

d ( oT ) oT 4 OV 

dt\0g:/ ðqi ðqi 


which are a system of second-order linear ordinary differential equations 


-= with constant coefficients. These equations are coupled (and hence more 
- difficult to solve) if the two quadratic forms T and V are not diagonal. We 


may assume that the real matrices A = fa; j] and B=[b;;] are symmetric. 

If a real nonsingular transformation S = [s;] € M, can be found such 
that SAS’ and SBS? are both diagonal, then with respect to new gen- 
eralized coordinates p; with 


n 
q; = x Sij Pj (4.5.14) 
j= 


the kinetic energy and potential energy quadratic forms T and V will both 


_ be diagonal. In this event, Lagrange’s equations will be an uncoupled set 
-of n separate second-order linear ordinary differential equations with 


constant coefficients. These equations can be solved easily in terms of 


; exponentials and trigonometric functions, and the solution to the original 


problem can be obtained by using (4.5.14). 

Thus, a substantial simplification in an important class of mechanics 
problems can be achieved if we can simultaneously diagonalize two real 
symmetric matrices by congruence. On physical grounds, the kinetic 
energy quadratic form is positive definite, and it turns out that this is a 
sufficient (but not necessary) condition for simultaneous diagonalization 
by congruence. 

There are several types of simultaneous diagonalization results that we 


might consider. We might have two Hermitian matrices A and B and we 


228 Hermitian and symmetric matrices 


might wish to have VAU* and UBU* diagonal for some unitary matrix 
U, or we might be satisfied with the weaker result of having SAS* and 
SBS* diagonal for some nonsingular matrix S. Similarly, if A and B are 
symmetric, we might want UAU” and UBU” or SAS’ and SBS" to be 
diagonal. We might even have a mixed problem with A Hermitian and B 
symmetric and want UAU* and UBU’ or SAS* and SBS" to be diagonal. 
In each case, the natural congruence to consider is one that preserves the 
special algebraic character of the respective matrix. All of these situa- 
tions arise in the applications. Fortunately, they can all be treated with 
the same techniques. The simplest case to consider is when one of the two 
matrices is nonsingular. We present the results in Table 4.5.15T, which 
gives a list of equivalent necessary and sufficient conditions for each case. 
The necessary and sufficient conditions are ordered and numbered to 


suggest parallelism among the various cases. 


4.5.15 Theorem. Let A, Be M, be given. Let U denote a unitary matrix 
and S a nonsingular matrix with U, Se M,,. Then we have Table 4.5.15T. 


Proof: Within each of the six groups of conditions, the equivalence of 
most of the stated conditions is a matter of definition. The equivalence of 
(3) and (4) in group I(a) follows from the observation that, if A and Bare 
Hermitian, then AB is Hermitian if and only if A commutes with B, and A 
is Hermitian if and only if A~' is Hermitian. The equivalence of III(a)(3) 
and (4) is established similarly, since B is symmetric if and only if Bo is 
symmetric, and A’ =A if A is Hermitian. 

Within each of the six groups, the necessity of condition (1) follows 
directly from the assumption that the respective congruences are in diag- 
onal form. For example, in case II(b) if SAST =A and SBS’=M are 


both diagonal, then 
AB =(STA7'!S)[S7'M(S7)7!] =S7(A7'M)(S7) 7! 


so R =S" will diagonalize C=A7'B. Similarly, in cases I(b) and III(b), 
R = S* will work. If S is unitary, the corresponding matrix R in each case 


is also unitary. 

Consider the cases I in which A and B are Hermitian and A is non- 
singular. Make the assumption I(b)(1) that there is a nonsingular R 
[riro<esPnleM,, each r;e C”, and a diagonal A = diag (^, X2, -+s An) 
with all \; real such that R~'A7'BR=A, and hence BR=ARA and 
R*BR=R*ARA. There is no loss of generality to assume that mul- 
tiple values of the \; terms are grouped together, so that A has the 


block form 


R is diagonal. 
R is symmetric. 


is a real diagonal matrix. 


unitarily diagonalizable. 
ICR is real diagonal. 


diagonalizable. 


(1) There is a unitary Ve M,, such that V* 


(2) C is unitarily diagonalizable. 


(3) C is normal. 


W is diagonal. 


CY is diagonal. 


n 
M, such that RTICR is diagonal. 


M, such that RT! 
M, such that R7! 


BA, i.e., A commutes with B. 
BA. 


(1) There is a nonsingular Re M, such that R- 


quivalent necessary and sufficient conditions for simultaneous diagonalization 


(1) There is a unitary Ve M, such that V*CV 


(2) C has real eigenvalues and is 


(3) C is Hermitian. 
(1) There is a unitary We M, such that W-'C 


(2) C has real eigenvalues and is 
(3) C is symmetric. 


(1) There is a nonsingular Re 
(2) C is diagonalizable. 

(1) There is a nonsingular Re 
(5) There is a nonsingular Re 


(4) AB 
(4) AB 


That which is to be diagonal 


(a) UAU* and UBU* 
(b) SAS* and SBS* 
(a) UAUT and UBUT 
(b) SAST and SBST 
(a) UAU* and UBUT 
(b) SAS* and SBST 


AİB. 


BT 
If B is nonsingular, 
=B-14, 


If A is nonsingular, 


A7'B 
A* 


B* 
= Al 
BT 
A nonsingular 
set C 
set C 


A nonsingular 
B 
C 
B 


B 


Hl. A 


Il. A 


Table 4.5.15T. 
Assumptions on A and B 


I. A 


230 Hermitian and symmetric matrices 
Ay 0 
A 
A= ye 
i (4.5.16) 
AE Mnp lsa sn; A= pil, i=1,2,...,k 


with all x; real and p; # py if i#/. If not all the d; terms are equal, choose 
any i,j with 1<i, jsn such that \;#, and consider the i, j entry of 
both sides of the identity R*BR = R*ARA. This is 


* * = rAr: N; 
r Ard; = 77 Brj =rj Bri = rjAridi =ri Arj Ni 


where we have used the facts that A and B are Hermitian (so x*Ay = y*Ax 
for all x, ye C”) and that A; and ), are real. Since A; #X jr We conclude 
that r7Ar;=0 and hence that rj-Ar;= rj Br; =r; Br; =0. This means that 
the matrices R*BR and R*AR are both block diagonal and conformal 


with (4.5.16); that is, 


B, 0 
B, . 
R*BR = = R*ARA 
0 B, 
pA 0 
_ p242 
0 be Ag 


with B;, A;€ Mn, for i=1,2,...,k. This is a partial reduction to diagonal 
form that is complete if k =n, that is, if all A; are distinct. If k<n, then 
some block has n; > 1 and B; = p; A;. Since A; and B; are Hermitian, we 
may use the spectral theorem (4.1.5) to write A; = U;D;Uř with Uj, Die 
M,,, U; unitary, and D; real diagonal. Then B; = p; A; = U;(p;D;) U7 1s 
diagonalized as well. If we set 


U; 0 Dd, 0 
U2 D, 


0 U, 0 D, 


with U; =[1] if n;=1, then U is unitary, D is real diagonal, and 


4.5 Congruence and simultaneous diagonalization 231 
R*BR=U(DA)U* and R*AR=UDU* 
Finally, we have the required representations 
A=[(R')*U]D[(R™')*U]* and B=[(R~!)*U](DA)[R™!)*U]" 


Notice that if we assume I(a)(1), then the argument is the same except 
that we also know that the matrix R is unitary. In this event, (R~!)*U = 
RU is unitary and the sufficiency of I(a)(1) is proved. 

The arguments required in the remaining four cases are similar. One 
uses the respective hypothesis to obtain congruent matrices that are block 
diagonal and then uses the spectral theorem for Hermitian matrices or 
Takagi’s factorization (4.4.4) for symmetric matrices to complete the 
reduction to diagonal form. 

Consider the cases II in which A and B are symmetric and A is non- 
singular. Make the assumption II(b)(1) that there is a nonsingular R= 
[ri r2 t Fn]E Mp, each r;e C”, and a diagonal A=diag(\j, Az, .--, Àn) 
(not necessarily real) such that R~'A~'BR = A, and hence BR = ARA and 
R'™BR=R’ARA. Assume again that the multiple values of the \; terms 
are grouped together so that A has the form (4.5.16) with all u; distinct. If 
not all the à; terms are equal, choose any i,j with Isi, jsn such that 
hj; #,; and consider the 7, j entry of both sides of the identity R’BR= 
RTARA. This is 


ri Ar hj = 17 Bry =r] Br; =r] Arh = rT Ar, di 


= where we have used the symmetry of A and B (x Ay = y"Ax for all 
x, yeC”). Since \;#d;, we conclude that r/ Ar;=0 and hence that 
. r} Ar; =r) Br; =r] Br; =0. This means that the matrices R’BR and 


RTAR are both block diagonal and conformal with (4.5.16); that is, 


B, 0 
R™BR=| ” = RTARA 
0 B, 
p Ay 0 
— p242 
0 Me Ag 


with B;, A;eM,,. If k=n, this is the required reduction. If k <n, then 
some block has n;>1 and B;= p; A;. Since A; and B; are symmetric, we 


d metric matrices 4.5 Congruence and simultaneous diagonalization 233 
232 Hermitian and sym 


x T * ; ; 
may use Takagi’s factorization (4.4.4) to write A; = U; Uj with U,, £ i€ 
Ma, U; unitary, and ©; diagonal with nonnegative diagonal entries. 

np , 
Then B; = p; A; = U; (pj EpDUr. If we set 


where we have used the fact that A is Hermitian and B is symmetric. 
From this it follows that |r7Arj||\,;|=|r'Ar,||d,|, and since [Ai] |, 
we conclude that r/Ar;=0, and hence that rj Ar; =F) BF; =F] BF =0. 
This means that the matrices R’BR and R*AR are both block diagonal 


U. 0 ai 0 and conformal with (4.5.16); that is, 
1 
> pm 
U= U» l y= 2 B, 0 
aT pis B 
° 0 ° 5, R'BR = ? = R*ARA 
. ® . p 0 
with U; =[1] if n;=1, then U is unitary, È is diagonal (with nonnegative B, 
diagonal entries), and ; ; a 
R’BR=U(ZA)U! and R'AR=ULU i AA. 
Finally, we have the required representations 
| 0 
Ax Ay 


A=[(R7)UJE[(R)"U]! and B=((R7')'UJZA[(R™')U)" 


If we assume II(a)(1), then R is unitary and (R ~!)TU = RU is unitary, so 
I of II(a)(1) is also proved. 
ie tthe eases ills “ight change in the argument is necessary. Le 
A, Be M,, with A Hermitian and nonsingular and B symmetric. Make m 
assumption III(b)(1) that there is a nonsingular R= Init = rnlé€ 
and a diagonal A = diag (^, A2, -> An) such that RT'AT BR =A, an 
hence BR = ARA and R*BR = R'BR = R*ARA. Assume now that the À 
terms of equal modulus are grouped together so that A has the form 


with all B;, A;, A; € Mn, and A; = o; D? with o; =0, 
D;= diag(e””, (2, wey (en) 
and all 0;; € R. If k =n, this is the required reduction. If k <n, then some 


block has n;>1 and B;=A;A;=0;A;D?. Since D; is diagonal and uni- 
tary, Df = D; = DJ = D;"' and hence 


D? B,D; = 0;D*A;D; (4.5.17) 


The left-hand side of this identity is a symmetric matrix D; B; D;, and the 
right-hand side is a Hermitian matrix o;D; A; D; with o; real. If 0; #0, we 
conclude that D*A; D; is both Hermitian and symmetric. The only way a 
Hermitian matrix can be symmetric is if it is real, so D*A;D, is a real 
symmetric matrix if 0; 40. If ø;=0 (which can happen for at most one 
value of 7), then DŽA;D; is Hermitian, but not necessarily real. By the 
spectral theorem, for each i=1,... , k there is a unitary matrix U;e Mn, and 
areal diagonal matrix M; such that D*A; D; = U;M, U;*. If o, #0, then U; 
may be chosen to be a real orthogonal matrix, in which case Uf = Uř and 


Df B; D; = 0;.D;'A;D; = U;(o;M) UF 


If ¢;=0, then it may not be true that U* = UJ, but nevertheless the dis- 
played equation is still correct because both sides vanish. Thus, for all 
i=1,2,...,k we have 


Aj =(D/U;)M,(D;U,)* and B, = (D,U,)(o,M,)(D,U,)" 


If we set 


Ay 0 
A2 


0 Ay 


where <A; = 


0 l (n) 


Ki 


; ( ) . . . 
with |p! | =|n!| for j, k=1,2,...,7, and |p{?? = |p] if ij. If no 
all the \; terms have the same modulus, choose any i, j with 1 si, j = 
and |),|#|A,| and consider the i, j entry of both sides of the identit 
R’BR = R*ARA. This is 


-Tps TRF —p* Ar) =P APY. 
rAr; =F} BF) =F] BR =rjAr di = 7 AT; Ni 


234 Hermitian and symmetric matrices 4 , 
-5 Congruence and simultaneous diagonalization 235 


DU, 0 
D, U> 


4.5.18 Corollary. Let A, Be M,,. 


(a) If A and B are both Hermitian, there exists a unitary Ve M, 
such that UAU* and UBU* are both diagonal if and only if AB is 
Hermitian; that is, AB = BA. 

(b) If A and B are both symmetric, ther i i 

, e exists a unitary Ue M 
such that VAU” and UBU” are both diagonal if and only if AB 
is normal; that is, ABBA = BAAB. 

(c) If A is Hermitian and B is symmetric, there exists a unitary 
U E M, such that VAU* and UBU” are both diagonal if and only 
if AB is symmetric; that is, AB = BA. 


M, opt 


then B=[(R7')7UJEM[U'R™] and A=[(R7')*U]M[U*R], as as 
serted. If we assume III(a)(1), then R is unitary and hence (RTU = 
RU and (R7')'U = RU is unitary, so the sufficiency of III(a)(1) i 
proved. 

This completes the proof of HI when A is nonsingular. If B is non 
singular, the hypothesis II(b)(1) says that there is a nonsingular Re 
M, such that R7'B~'AR=A is diagonal, so AR= BRA and R*AR= 
R'BRA. From this point, the argument is formally the same as in the 
case in which A is nonsingular. One merely interchanges A and B in the 
proof and uses the Takagi factorization (4.4.4) to diagonalize D/' B,D 
instead of the spectral theorem. O 


In cases I and II of Theorem (4.5.15) (Table 4.5.15T) there is a fa 
miliar condition on A~'B that is equivalent to simultaneous diagonal- 
izability by the respective congruence, namely that A'B is diagonal- 
izable (perhaps with real eigenvalues), that is, ATİB is of the form 
RAR™! with A diagonal (perhaps with A real). This condition can, in 
principle, be tested by examining the minimal polynomial of A`'B 
to see if it has distinct linear (perhaps real) factors. In case HI, how- 
ever, the condition is not so familiar, namely that ATİB is of the form 
RAR™ with A diagonal. This requirement says that A~'B is diagonal- 
izable by consimilarity, rather than by ordinary similarity. See Section 
(4.6) for a discussion of consimilarity. Theorem (4.6.11) shows that 
condition (4.5.15.I1I(b)(1)) is equivalent to the condition that CC has 
all real nonnegative eigenvalues and is diagonalizable and rank C= 
rank CC, 

It was convenient to make a nonsingularity assumption in Theorem 
(4.5.15), but this assumption can be eliminated in the cases I(a), II(a), 
and III(a) of unitary congruence. In case I(a), this approach gives an 
alternative proof of the classical result (4.1.6) on simultaneous unitary 
diagonalization of commuting Hermitian matrices. 


Proof: (a) If VAU*=A and UBU*=M are both diagonal, then A= 
U*AU, B=U*MU and AB=U*AUU*MU = U*AMU = U*MAU = 
U*MUU*AU = BA. Conversely, if AB = BA, then A, =A+el is nonsin- 
gular and Hermitian for some €>0 and A,B=(A+el)B=AB+ceB= 
BA+ eB = B(A+ el) = BA.. Thus, B commutes with A, and A>' and hence 
Ae B is Hermitian. By I(a)(3) of (4.5.15) (Table 4.5.15T) there exists a 
nitary U; such that U, A. U* = U.AUi+el =A, and U,BU* = M, are both 


ssume that A is nonsingular. Then AB = (A~')~'B is normal, and 
I(a)(3) of (4.5.15) says that the two symmetric matrices A~! and B are 
multaneously unitarily diagonalizable. Thus, there is a unitary Ve M 
nd diagonal A,MeM,, such that A~!=UAU’ and B=UMU™. Then 
=UA~'U" and B= U0MO", which is a simultaneous diagonalization of 
and B of the required form. If A is singular, then by (4.4.4) there is a 
nitary Ve M, such that VAUT is diagonal, and we can permute the 
lumns of U if necessary so that 


2 0 
0 0 


and È is symmetric (in fact, dia i 

, , gonal) and nonsingular. If i 
UBU T in corresponding block form ° me wote 
UBU” = | Bu Bız 
i2 Bz 


T 
UAU =| | LeM,, Isk<n 


| ByeM,, Bye M,,_ 5 


then the blocks By; and Baz are symmetric and we have 


vau (ORD = vapur=[F OJA Befa 2A] 


nti, 


0 0| Bh By 0 0 


236 Hermitian and symmetric matrices 


4.5 Congruence and simultaneous di j 

- _ us diagonalizati 

But UABU* is normal, too, and hence E B, =0 (see Problem 20 at the 8 zation 237 
end of this section) and By, =0 since E is nonsingular. This shows that 


E 0 B, 0 
UAUT = —  UBU=| ” , 
F o | 0 Bz 


matrices, which covers the case in which at least one of the matrices is 
' nsingular. We have already treated the special case in which simul- 
aneous diagonalization by *congruence is possible. 


and | 
4.5.19 Theorem. Suppose that A, Be M, are Hermitian, and that A is 


nonsingular. There is a positive integer k and a nonsingular Ce M, such 
n 


YB, 0 
T\ (FIBTTT\ — 1 
(UAU’)(UBU =| | that 


0 0 


By the previous argument for the nonsingular case, we know that there Ay 

is a unitary V;e M, and diagonal A;, A2 € Mx such that E = VA; V and A 0 B, 0 
B =V, AVi. Since Bzz is symmetric, we also know there is a unitary C*AC = 2o — CBC = By 
V.eM,,-x% and a diagonal A; e M„-x such that By = V2 A3 VJ. If we let 0 > 

A= A,@0E M,, M= M®A3, and V=Vi@V2, we have VAUT = VAV", Ar 0 B, 


UBUT = VMV". Thus, A =(U*V)A(U*V)" and B= (U*V)M(U*V}" is 
a simultaneous diagonalization of the required form. 
(c) If UAU* = A and UBU =M are both diagonal, then A is neces- 
sarily real. We have A = U*AU, B= U*MU, and 
AB = U*AUU*MO = U*AMU = U*MAU 
= U*MOU'AU = (U*MU) (U*AU) = BA 
Conversely, if AB = BA, then A,=A+tel is nonsingular and Hermitian 
for some e>0 and A,B=AB+eB=BA+eB= BA,. Thus, condition 
I1(a)(4) of (4.5.15) is satisfied and there exists a unitary U€ Mp such 
that U, A Už =U, AU? +l = A, and U,BU! = M, are both diagonal, and 
hence U.AU*Z =A,— el and U.BU! =M; are both diagonal. UO 


where each pair A;, Bie Ma, i= i 
were i Bi np 1=1,2,...,k is one of the two possible 


Q 
0 ; 
T » Aiše (4.5.20) 


The problem of simultaneously diagonalizing two singular Hermitian 
matrices by *congruence (not necessarily unitarily) is considered in Prob- 
lem 8. 

We have seen that, under *congruence, a single Hermitian matrix may 
always be brought to a remarkably simple form (diagonal with +1 or 0's 
on the diagonal), and, under certain conditions, a pair of Hermitian 
matrices may be brought simultaneously into diagonal form by *con- 
gruence. A natural question to raise, then, is: Into what canonical form 
may a pair of general Hermitian matrices A, B e€ M, be brought under 
simultaneous *congruence? That is, what canonical forms may the pair 


C*AC and C*BC 


assume with a single congruence by C? Although this question has been 
treated for general Hermitian pairs (with possibly both singular), th 
general result is significantly complicated, both to present and to prove 
We state without proof here the canonical pair theorem for Hermitia 


0 (4.5.21) 


aed 


ith æ complex. In (4.5.20), eis ei i 
. .5.20), ther +1, and in (4.5.21), n;i 
e two nonzero blocks are both in M(1/2)n,. Pe 7 1S even and 


_ Notes 


in the case that ais real, n, = 1 is possible, and the two blocks are 
en of the form +a, +1. Several 1-by-1 blocks correspondin 
to the same value of a (and with the same value e = 1. sa ) wo id 
produce a block of the form œf in C*BC and J in CAC 


4.5 Congruence and simultaneous diagonalization 239 


238 Hermitian and symmetric matrices 


*congruence. (b) For some real nonzero scalars a, b, aA+bB and B are 
simultaneously diagonalizable by *congruence. (c) A and B are simulta- 
neously *congruent to a pair of commuting matrices. (d) A +iB is *con- 
gruent to a normal matrix. 


2. In the case that a is complex, nj=2 is possible, and the two 
blocks then are of the form 


0 «a _fo0 1 
B-l; o | a=, >| 


3. In the circumstance of the theorem, the simultaneous block 
structure corresponds exactly to the Jordan canonical form of 
A7'B. That is, the basic Jordan blocks of A B are precisely the 
A;'B;. Note that (C*AC)~'(C*BC) = c- (A~'B)C, so that c 
is also a similarity matrix that takes A7 B to Jordan canonica 
form. Thus the form guaranteed by the theorem can be found 
from the Jordan canonical form of A~ B (determination of the 
inertial factors, the e terms, is auxiliary). 


4. Use the argument in the proof of (4.5.15) and the commuting family 
Theorems (1.3.19) and (4.1.6) to prove the following generalization of 
I(b) of Theorem (4.5.15). Let A, A2,..., Axe M, be given Hermitian 
_ Matrices with A; nonsingular. There exists a nonsingular matrix Te 
_ M, such that T*A;T is diagonal for all i=1,2,...,k if and only if 
- (a) Aj'A; is similar to a real diagonal matrix for all i=2,...,k, and (b) 
_ {Ay'A;:i=2,..., 2} is a commuting family of matrices. Hint: Let C= 
_ Ay'A; and let SC;S~! be real diagonal for all i=2,...,k. Let B= 
_ (S*)7'A; ST! and show that {B;} is a commuting family of Hermitian 
. matrices. There is a unitary matrix U such that UB; U* is diagonal for all 
_i=2,...,k and T=US is the required congruence matrix. What is the 


the canonical pair form (4.5.19) for two Her- 
4.5.22 Remark. Just as - corresponding generalization of II(b) of (4.5.15)? 


mitian matrices A,B under *congruence is analogous to the Jordan 
canonical form of A~'B, there is a canonical pair form for two rea 
symmetric matrices 4, B under real “congruence that is analogous ° ; D 
real Jordan canonical form of A~'B. In it, the blocks B; of type (4. 2! 
are replaced by the natural analog of blocks of the type (3.4.4), while 
other possible block types are the same. 


_ 5. A differential operator L given by (4.0.4) with a real symmetric coeffi- 
— cient matrix A(x) = [a; j (X)] is said to be elliptic at a point xe Dc R” if the 
coefficient matrix A (x) is nonsingular and all its eigenvalues have the same 
sign. L is said to be hyperbolic at x if A(x) is nonsingular, n—1 of the 
_ eigenvalues have the same sign, and one eigenvalue has the opposite sign. 
_ Explain why a differential operator that is elliptic (or hyperbolic) at a point 
_ with respect to one coordinate system is elliptic (or hyperbolic) at that 
- point with respect to every other coordinate system. Laplace’s equation 


Problems | 
1. Let A, Be M, and suppose B is nonsingular. Show that there exists 


CeM, such that A=BC. Moreover, for any nonsingular Se Mr, we ef Bf Bf 

have SAS* = (SBS*)C’, where C’ is similar to C. o T ag? = 

a e ow that if Hoe ee ee a S n oyen inestia atic - gives an example of an elliptic differential operator, and the wave equation 
(4.5.7), then they have exactly the same number of positive diagonal en- Dy- af af 7 ae 

tries. The argument given in the text relies on a corollary of the Courant ax? ay. are 


Fischer theorem. Provide the details for the following elementary aren 
ment: Suppose D2 = S*D, S, and suppose D; has exactly s positive diagona 
entries and at least one negative diagonal entry. Suppose that the first 
s diagonal entries of D, and the first ¢ diagonal entries of D, are posi- 
tive with l<s,t<n. If s<¢, show that there is a pani x= 
ich. = motes = =0 and (Sx), = ( Xh =t! 

xe C” such that X;+1 =Xr+2 = Xn o= 
(se), =0. Then show that x*Dx>0 and (Sx)*D,(Sx) <0 to obtain å 
contradiction. 


3. Let A,BeM, be Hermitian. Show that the following four condi- 
tions are equivalent: (a) A and B are simultaneously diagonalizable by. 


an example of a hyperbolic one. Both are presented in Cartesian coor- 
nates. Both look very different in spherical polar, cylindrical, or other 
ordinates. 


6 Let X=[Xj,...,X,]’ and Y=[Y,,..., Y,]” be two vectors of real 
random variables with finite second moments. It is a fact (see Chapter 7) 
hat the covariance matrices of X and Y each have only nonnegative 
eigenvalues. Suppose that at least one of the covariance matrices is non- 
singular. Show that there exists a real nonsingular matrix Se M, such 
that the covariance matrices of SX and SY are both diagonal. In statistical 


4.5 Congruence and simultaneous diagonalization 241 


where A’, B'e Maks N(A’) 1 N(B’) = {0}, and the upper left-hand 
corner zero blocks are k-by-k. Show that A and B are simultaneously 
: diagonalizable by *congruence if and only if A’ and B’ are simulta- 
neously diagonalizable by *congruence. Although A’ and B’ may both 
be singular, the intersection of their null spaces is trivial. (f) Assemble 
: the information from (a)-(e) to state and prove a general theorem 


on simultaneous diagonalization of two Hermitian matrices by *con- 
. gruence. 


240 Hermitian and symmetric matrices 


terms, this says that a single nonsingular linear transformation S can b 
found so that the components of SX and SY are each uncorrelated. 


7. Use Problem 4 to give conditions on three or more random vector 
that are sufficient to guarantee the existence of a single nonsingular linea 
transformation with the property that the components of the trans 
formed random vectors are uncorrelated. 


8. Case I(b) of Theorem (4.5.15) considers the problem of simultaneou 
diagonalization of two Hermitian matrices by *congruence in the cas 
that at least one of the matrices is nonsingular. Corollary (4.5. 18a) con 
siders the problem of simultaneous diagonalization by unitary con 
gruence when both matrices may be singular. If the two matrices are bot 
singular, the problem of simultaneously diagonalizing them by o 
necessarily unitary) *congruence can be reduced eventually to (4.5. 

but one must look at the behavior of the two matrices on the orthogona 
complement of the intersection of their null spaces. Let A, Be M, b 
Hermitian and assume that both are singular. Let N(A) and N(B) denot 


, . 00 10 
the null spaces of A and B, respectively. (a) Consider lo | and lo o] 


show that there exist pairs of singular Hermitian matrices that are simu 
taneously diagonalizable by *congruence. (b) Suppose N(A)N(B) = 
{0}. Show that if A and B are simultaneously diagonalizable by con 
gruence, then there exists a real number a such that aA +B is nonsin 
gular. Hint: If Ce M, is nonsingular, C*AC =A, and C BC =A) wit 
A, and A, diagonal, show that the zero main diagonal entries of A, an 
A, do not fall in the same positions. Can you select a so that all the mai 
diagonal entries of aA; +A, are nonzero? (c) Use (b) to show that 


9. If A, Be M,, and Bis nonsingular, show that A commutes with B if 
-and only if A commutes with B~!, 


10. Show that [? ol and [ 4 a can be reduced simultaneously to diag- 


onal form by a unitary “congruence but cannot be reduced simulta- 
neously to diagonal form by *congruence. Use the construction employed 
in the proof of case II(b) of (4.5.15) to carry out the reduction and find 
an explicit unitary ‘congruence matrix which works. 


wei’ 


Cemal 


11. Show that [ i a] and [e | cannot be reduced simultaneously to 
diagonal form by either *congruence or “congruence. 


12. Let A,BeM, with A nonsingular. Show that each of the follow- 
ing conditions is necessary and sufficient for A and B to be simulta- 
neously diagonalizable by congruence in the sense of, and under the 


assumptions of, each of the indicated cases of Theorem (4.5.15) (Table 
4.5.15T). 


010 000 Case Necessary and sufficient condition 

A=1 0 0 and B= 0 0 1 Ka) There exists a Hermitian F e M, such that B= AF, 

0 0 0 01 0 I(b) There exists a diagonalizable Fe M, with real eigenvalues such 
that B= AF, 
. ; ; * ruence. (d) If 

cannot be diagonalized simultaneously by *cong (9) Ia) There exists a normal Fe M, such that B= AF. 

N(A)O N(B) = {0} II(b) There exists a diagonalizable Fe Mọ, such that B= AF, 
if R is nonzero, and if aA +B is nonsingular, use Problem 3(b) t HI(a) There exists a symmetric Fe M, such that B = AF. 
“ho. © hat A and B are simultaneously diagonalizable by *congruenc HI(b) There exists a condiagonalizable matrix Fe M, such that B= AF 
snow 


if and only if (aA + B)~'B is diagonalizable and has only real eigen [see (4.6.2)]. 


values. (e) If dim N(A)N N(B) =k 21, let [u1 U2, +, Un} be an ortho 
normal basis of R” for which {u}, u2,..., ug} is an orthonormal basis 0 
N(A)ON(B). If U = [t uz +--+ Un] EM, show that 


3. Let A, Be M,, be symmetric (possibly both singular) and suppose 
at there is a unitary Ue M, such that UAUT =A and UBUT =M are 
oth diagonal. Show that there exists a unitary matrix V such that BA = 


{ 


0: 0 
U*AU = | ~-}-<- | and U*BU= Eira 


4.5 Congruence and simultaneous diagonalization 243 


A(t) = S(t) AS(t)*. What is A(0)? A(1)? Since A(t) is nonsingular and 
changes continuously as ¢ goes from zero to 1, argue that A(0) has the 
same number of positive (negative) eigenvalues as A(1). Treat the general 
case by considering A + eI for small e >0. 


242 Hermitian and symmetric matrices 


AVB. Hint: If A=diag(\, A2, ---, An), Show that there exists a unitary 
diagonal matrix D such that A= DA= AD. Then show that 


BA =U*MAU = U*AD, D> MU = A(U'D,D,U)B 


itary diagonal matrices. l 
where Dy and D2 are unitary dag 20. If A= F S] eM, with Be Mg, 1& k <n, show that A is normal if 


and only if B is normal and C=0. Hint: Compute AA* and A*A. If 
c*C=0, then (Cx)*(Cx)=0 for all xe C”7* and hence Cx=0 for all 
xeC", 


14. Use the necessary condition in Problem 13 to show that the two sym- 
metric matrices in Problem 8(c) cannot be diagonalized simultaneously 
by unitary "congruence. Hint: Compute the first column of BA and of 
AUB. Use (4.5.18b) to show the same thing more easily. 
21. Show that the method of proof used in (b) of (4.5.18) can also be 


; ondition for 
15. If A4,BeM, are symmetric, show that the necessary con used to prove parts (a) and (c). 


simultaneous diagonalization by unitary Tcongruence in Problem 13 is 
also a sufficient condition provided that both A and B are nonsingular, 
Hint: If BA = AUB with both A and B nonsingular, then A~'BAB~'=U 
and [=UU*. This implies that AB~'B~'A=B~'AAB™'. Taking the 
inverse of both sides implies that A7'B is normal. 


22. Let F ={Aj,..., Ax] CM, be a given family of complex symmetric 
matrices and let G={A;A;:i, j=1,2,...,k}. If there exists a unitary 
UeM, such that UA;U" is diagonal for all i=1,...,k, show that Q is a 
commuting family. What does this reduce to when k = 2, and what is the 
connection with (4.5.18b)? In fact, commutativity of G is also sufficient 
to ensure the simultaneous diagonalizability of ¥ by unitary “congruence; 
see the paper of Hong and Horn listed in the references at the end of this 
section. 


16. Let A,BeM, be symmetric (possibly both singular) and suppose 
that there is a unitary Ue M, such that UAU T_ A and UBU'=M are 
both diagonal. Show that AA commutes with BB. Show that this neces- 
sary condition for simultaneous diagonalization by unitary "congruence 
is not sufficient by considering the two matrices in Problem 8(c). Use 
Corollary (4.4.5) to show that this necessary condition és sufficient pro 
vided that both AA and BB have n distinct eigenvalues. 


23. Let F={A),..., Ax} CM, be a given family of complex symmetric 
matrices, let I ={B,,...,B,} CM, be a given family of Hermitian ma- 
trices, and let G ={A;A;:i, j=1, ..., k}. If there is a unitary Ue M, such 
that every UA;U" and every UB; U* is diagonal, show that each of G and 
KX is a commuting family and B; A; is symmetric for all /=1,..., k and all 
j=l,...,m. What does this reduce to when k=m=1, and what is the 
connection with (4.5.18c)? In fact, these conditions are also sufficient to 
ensure the simultaneous diagonalizability of F and 3C by the respective 
congruences; see the paper of Hong and Horn listed in the references at 
the end of this section. 


17. Let A, Be M, with A Hermitian and B symmetric and suppose there 
is a unitary Ue M, such that VAU*= A and UBU T=M are both diag 
onal. Show that A commutes with BB. Show that this necessary condi 
tion for simultaneous diagonalization by (mixed * and T) congruence i 
not sufficient by considering the two matrices in Problem 11. Use Corol 
lary (4.4.5) to show that this necessary condition is sufficient provide 
that all the eigenvalues of BB are distinct. 


18. Let A,BeM, with A and B symmetric and A nonsingular. Show 
that if the generalized characteristic polynomial py, a(t) =det(tA—B 
has n distinct zeroes, then A and B are simultaneously diagonalizable b 
Tocongruence. Hint: What are the eigenvalues of A'B? 


Further Readings. Ostrowski’s proof of (4.5.9) and related results are in 
“A Quantitative Formulation of Sylvester’s Law of Inertia,” Proc. Nat. 
Acad. Sci. 45 (1959), 740-744. Another version of Theorem (4.5.25) is 
stated in [GLR 82]; a careful proof including the case in which both ma- 
trices are singular is contained in unpublished notes by R. C. Thompson. 
For results about simultaneous diagonalization of more than two ma- 
trices, see Y. P. Hong and R. A. Horn, “On Simultaneous Reduction of 
Families of Matrices to Triangular or Diagonal Form by Unitary Con- 
gruence,” Linear and Multilinear Algebra 17 (1985), 271-288. 


19. Provide the details for the following alternative proof of Sylvester’ 
law of inertia (4.5.8). If Ae M, is Hermitian and nonsingular and i 
SeM,, is nonsingular, let S= QR be a factorization in which Qe M, i 
unitary and Re M, is upper triangular (2.6.1) with positive main diag 
onal. Show that S(t) =tQ+(1—1)QR is nonsingular if O<f<1 and le 


244 Hermitian and symmetric matrices 4.6 Consimilarity and condiagonalization 245 


(a) A is contriangularizable; 
(b) A is unitarily contriangularizable; and 
(c) All the eigenvalues of AA are real and nonnegative. 


If AeM,, is unitarily condiagonalizable, then A = UAŪ ~! = UAUT 
for some unitary Ue M, and A = diag(),...,,). Thus, AT = (UAUT)T = 
UA'U? = VAU "= A, and hence A is symmetric. Corollary (4.4.4) says 
that the converse is true as well, and that the diagonal matrix can always 
be taken to be nonnegative. Thus, the problem of unitary condiagonali- 
zation has also been solved already. 


4.6 Consimilarity and condiagonalization 


The motivation for the topic of this section comes from three results 
in the preceding two sections. Theorem (4.4.3) characterizes all matrices 
of the form UAUT, where A is upper triangular and U is unitary; for 
our present purposes we prefer to write this factorization as UAUT 
UAT. Corollary (4.4.4) characterizes all matrices of the form UZU! = 
ULZU~', where E is diagonal, and case III of Theorem (4.5.15) requires 
information about when a given square complex matrix A can be reduced 
to diagonal form by the transformation A => SAS! for some nonsin- 


gular S. 
4.6.4 Theorem. A matrix A € M, is unitarily condiagonalizable if and 


only if it is symmetric. 

The remaining problem concerning contriangularization and condiag- 
onalization is to characterize usefully those matrices that can be condiag- 
onalized by a consimilarity that is not necessarily unitary. 

If A e€ M, is condiagonalizable and S~'45 = A =diag(\,,...,,), then 
AS=SA. If S=[s,...s,] with each s;e C”, this identity says that AŞ; = 
d,s; for i=1,...,n. This equation is similar to, but crucially different 
from, the usual eigenvector-eigenvalue equation. 


4.6.1 Definition. Two matrices A, Be M, are said to be consimilar if 
there exists a nonsingular Se M, such that A=SBS~'. If the matrix S 
can be taken to be unitary, A and B are said to be unitarily consimilar. 
If A=SBS~' and S= U is unitary, then A = SBS~'=UBU': if S=O 
is complex orthogonal, then A= SBS7'= OBQ"*; if S=R is a real non- 
singular matrix, then A=SBS~'=RBR~'. Thus, special cases of con- 
similarity include ‘congruence, *congruence, and ordinary similarity. 
Like ordinary similarity, consimilarity is an equivalence relation on 
M,, and we may ask which equivalence classes contain triangular or 


diagonal representatives. 4.6.5 Definition. Let Ae M, be given. A nonzero vector xe C” such 


that AX=x for some \€C is said to be a coneigenvector of A; the 
scalar is a coneigenvalue of A. 

The identity AS = SA says that every nonzero column of S is a con- 
eigenvector of A. Since the columns of S are independent if and only if 
S is nonsingular, we see that a matrix A €M, is condiagonalizable if 
and only if it has n independent coneigenvectors. To this extent, the 
theory of condiagonalization is entirely analogous to the theory of ordi- 
nary diagonalization. 

But every matrix has at least one eigenvalue, and it has only finite- 
ly many distinct eigenvalues; in this regard, the theory of coneigen- 
values is rather different. If AX =x, then e AF = A(ex) =e" x= 
(e-7"r) (ex) for all 0 e R. Thus, if \ is a coneigenvalue of A, then so is 
e”) for all @€ R. On the other hand, if AX =x, then AAx = A(AX) = 
A(x) = X\AX = Xx =|A|2x, so a scalar dis a coneigenvalue of A only if 
|\? is an eigenvalue of AA. The example A = E “Ol for which AA = 
-217 has no nonnegative eigenvalues, shows that there are matrices that 


have no coneigenvalues at all. It is known, however, that if Ae M, and n 
s odd, then A must have at least one coneigenvalue, a result analogous 


4.6.2 Definition. A matrix A e M, is said to be contriangularizable if 
there exists a nonsingular Se M, such that S~!AS is upper triangular; it 
is said to be condiagonalizable if S can be chosen so that S~!AS is diag- 
onal. It is said to be unitarily contriangularizable or unitarily condiag- 
onalizable if it can be reduced by consimilarity to the required form viaa . 
unitary matrix. | 

If A e M, is contriangularizable, and if S~'!AS = A is upper triangular, 
then an explicit calculation shows that the main diagonal entries of Aå = ` 
S~'(AA)S are nonnegative. Consequently, all the eigenvalues of AA are - 
nonnegative. But then Theorem (4.4.3) says that there is a unitary U suc 
that VAUT = UAŪ `! is upper triangular. Thus, the problem of decidin 
whether a given matrix can be reduced to upper triangular form by con 
similarity has already been solved. 


4.6.3 Theorem. Let Ae M, be given. The following statements ar 
equivalent: 


246 Hermitian and symmetric matrices 


to the fact that every real matrix of odd order has at least one real 
eigenvalue. . 

Thus, in contrast to the theory of ordinary eigenvalues, a matrix may 
have infinitely many distinct coneigenvalues or it may have no coneigen- 
values at all. If a matrix has a coneigenvalue, it is sometimes convenient 
to select from among the coneigenvalues of equal modulus the unique 
nonnegative one as a representative. 

The necessary condition we have just observed for the existence of a 
coneigenvalue is also sufficient. 


4.6.6 Proposition. Let Ae M, and let }\=0 be given. Then A is an 
eigenvalue of AA if and only if +V) is a coneigenvalue of A. 


Proof: If X20, Vk=0, and A¥=vdx for some x #0, then AAx= 
A(AX) = A(VAX) = VAAX = VAVAX = DX. a 
Conversely, if AAx =x for some x #0, there are two possibilities: 


(a) AX and x are dependent; or 
(b) AX and x are independent. 


In the former case, there is some p € C such that AX = px, which says that 
wis a coneigenvalue of A. But then \x = AAx = A(AX) = A(X) = pAX = 
jipx =|p|°x, so |u| = +V). Since ey is a coneigenvalue associated 
with the coneigenvector e’x for any 6ER, we conclude that +v) is a 
coneigenvalue of A. Notice that AA(AX) = A(AAx) = A(AX) = MAX) 
and AAx = x, so if \ is a simple eigenvalue of AA, (a) must always be 
the case. 

In the latter case (b) (which could occur if X is a multiple eigenvalue of 
AA), the vector y= AX¥+v)x is nonzero and is a coneigenvector corre- 
sponding to the coneigenvalue +V) since 


Aj = AAx + VN AK = \X 4+ VNAK = VMAX +VAx) = VAY O 


We have seen that to each distinct nonnegative eigenvalue of AA there 
corresponds a coneigenvector of A, a result analogous to the ordinary 
theory of eigenvectors. The following result extends this analogy a bit 


further. 


4.6.7 Proposition. Let A e M, be given, and let x1, X2,...,x,% be con- 
eigenvectors of A with corresponding coneigenvalues dis N25 ry Ake If 
[A| ¥|A;| whenever 1 si, j< k and i#j, then {x,,...,x,} is an indepen- 


dent set. 


4.6 Consimilarity and condiagonalization 247 


Proof: Each x; is an eigenvector of AA with associated eigenvalue DOR 


The vectors x;, ..., X are independent by (1.3.8) because they are eigen- 
vectors of the matrix AA and their associated eigenvalues |\,|*, ..., Al? 
are distinct by assumption. 0O 


This result, together with Proposition (4.6.6), gives a lower bound on 
the number of independent coneigenvectors of a given matrix and yields 
a sufficient condition for condiagonalizability that is analogous to a 
familiar sufficient condition for ordinary diagonalizability. We give a 
more general condition in Theorem (4.6.11). 


4.6.8 Corollary. Let A e M, be given. If AA has K distinct nonnegative 
eigenvalues, then A has at least k independent coneigenvectors. If k=n, 
A is condiagonalizable. If k=0, A has no coneigenvectors at all. 

These bounds on the number of independent coneigenvectors are 
sharp. For A =J,(1), an elementary Jordan block 


0 
Jn = . 7 1 € Mn 

0 1 
AA = J?(1) has | as its only nonnegative eigenvalue. The coneigenvector 
equation AX =x is easily seen to have only real solutions, so every con- 
eigenvector is also an eigenvector, and the subspace of eigenvectors is 
one-dimensional. Direct sums of elementary Jordan blocks can therefore 
be used to give examples of matrices A e M, such that AA has K distinct 
nonnegative eigenvalues and A has exactly k independent coneigen- 
vectors for any integer k such that 1<k<n. 

Our objective is to give a simple condition for a given matrix to be 
condiagonalizable, and as a first step we prove the following lemma. The 
motivation for this result is that if a given matrix A € M,, is consimilar to a 
scalar matrix, then A = S(\J)S~'=)SS~' and AA = ASS T'KSST' = |A|? 
Matrices with this property (that AA is a scalar matrix) are the basic 
building blocks from which condiagonalizable matrices are constructed. 


4.6.9 Lemma. A matrix A€ M, has the property that AA =/ if and 
only if there exists a nonsingular Se M, such that A = S57! 


Proof: We have just seen that the stated condition is necessary. To show 
that it is sufficient, define Sy=e'"A+e~'7 for any 6 € R and observe that 


248 Hermitian and symmetric matrices 4.6 Consimilarity and condiagonalization 249 


ASp= Ale" Atel) =e MWAA+e"A =e Ate I= Sa (4.6.10) 


then the equation BA = AB says that AiB =, Bi; for all i=1,2,...,k. 
Since \;#, if i# Jj, we conclude that B;=0 if i# 7, and hence B is 


Since A has only finitely many eigenvalues, there is some 6) e R such that block diagonal 


—e~~!% is not an eigenvalue of A. For this value of 6, 
Sp, =e” (A +e") 
is nonsingular and A = Soy Sog from (4.6.10). O 


By 0 


We can now state and prove a necessary and sufficient condition for 
condiagonalizability. 


with diagonal blocks the same size as those of A. The equation BB= A 
means that each BBa =); for i=1,2,...,k. Notice that B; must be 
nonsingular if \; >0. If \;>0, we can write this equation as 


1 1 
|| ee] 
i 1 


and we can use Lemma (4.6.9) to conclude that there is a nonsingular 
Si € Mn; such that By = Si( VX; In) 571. If dy =0, then 


rank By +rank Baz + --- +rank By, 
=rank B =rank A = rank AA = rank A =n +m + > + Aga] 


4.6.11 Theorem. Let A e M,. There exists a nonsingular SeM, anda ; 
diagonal Ae M, such that A=SAS~ if and only if AA is a diagonal- . 
izable matrix with real nonnegative eigenvalues and rank A = rank AA. 


Proof: The stated conditions are clearly necessary since 
AA =SAS~'SAS~'=S|A/??S7' 


and the rank of both AA and A is the number of nonzero diagonal entrie 
in A. Conversely, if AA is diagonalizable and has nonnegative eigenvalues 
there is a nonsingular S €e M, and a nonnegative diagonal A e M, such tha 
AA =SAS~'. There is no loss of generality to assume that like diagona 
entries in A are grouped together and that A = M n MIm ® +++ OAgI ny 
where In € Mn, and M > M > M> e >A, ZO. We then have 


SAAS =S7~'ASS~'AS = (S'AS) (STIAS) =A 


If we set B= S'AS, then (since consimilarity is an equivalence relation 
it will suffice to show that B is condiagonalizable if BB = A. Since A is real 
A=A=(BB)=BB=BB, so B and B commute. Thus, BA = B(BB)= 
BBB =(BB)B=AB, so B and A also commute. If we write B in block 
form as 


This means that the rank of By, ; is zero, so the last block B,,; must actu- 
ally be a zero block if \,=0. In this event, we can write 0=B,,= 
Sk( VAr 1) Se 1, where Sk € Mn, is an arbitrary nonsingular matrix. If we 
set S=S,;@®---@S;, we have shown in all cases that 


B=S(VM In, @ + OVR In, S| 
and we are done. O 


When applied to case III(b) of Theorem (4.5.15), the necessary and 
sufficient conditions for condiagonalizability have the following conse- 
quence. Let A, Be M, be given with A Hermitian and B symmetric and at 
least one of A,B nonsingular. Set C=A~'B or B™'A depending on 
which is nonsingular. There exists a nonsingular Se M, such that both 


By Bu . Bie SAS* and SBS’ are diagonal if and only if CC has all nonnegative eigen- 

: Bo : values and is diagonalizable and rank C = rank CC. 
B= : The special case in which A is a complex symmetric matrix is handled 
Ba n By, easily by the theorem, since AA = AA* is Hermitian in this case and 


hence is diagonalizable. Moreover, rank A = rank AA* for any Ae M,, 
so the hypotheses of the theorem are satisfied when A is a complex sym- 
metric matrix. The theorem shows that every complex symmetric matrix 


with block sizes conformal to those of 


Ni Ln 0 is condiagonalizable but does not yield directly the fact that the con- 
A= 0 on ’ In € Mnp i=1,2,...,k diagonalization can be accomplished with a unitary transformation in 
NLn, this case. See Problem 22 at the end of this section. 


250 Hermitian and symmetric matrices 


These observations on consimilarity and condiagonalization help 
to put into perspective Takagi’s factorization (4.4.4) for complex sym- 
metric matrices and Theorem (4.4.3) on triangularization by unitary con- 
gruence. Theorem (4.4.3) says that every matrix A € M, such that AA 
has all nonnegative eigenvalues can be unitarily contriangularized, and 
Takagi’s result says that every complex symmetric matrix can be unitarily 
condiagonalized. 

Since there is no useful distinction between “real” and “not real” for 
coneigenvalues, there is no distinction between “unitarily condiagonaliz- 
able with real (or positive) coneigenvalues,” which would be an analog of 
Hermitian (or positive definite), and “unitarily condiagonalizable with 
complex coneigenvalues,” which would be an analog of normal. Thus, 
complex symmetric matrices may be thought of as an analog (for con- 
similarity) of the whole class of normal matrices (for ordinary simi- 
larity), and Takagi’s factorization may be thought of as an analog of the 
spectral theorem for normal matrices (2.5.4a, b). 

The theory of ordinary similarity arises as a result of studying linear 
transformations referred to different bases. In its general context, con- 
similarity arises as a result of studying antilinear transformations referred 
to different bases. An antilinear transformation T is a mapping T: V = W 
from one complex vector space into another that is additive [T(x+y)= 
Tx + Ty for all x, y e V] but conjugate homogeneous [T (ax) = 4Tx for all 


ae Cand all xe V, sometimes called antihomogeneous]. Such transfor- 


mations occur in quantum mechanics in the study of time reversal. 
The class of condiagonalizable matrices is a wide one, which includes 


all real diagonalizable matrices with real eigenvalues, all (real or com- - 
plex) symmetric matrices, and all matrices of the form H 2S with H- 


Hermitian and S symmetric (see Problems 10 and 11 at the end of this 


section). The latter observation is the basis for the second of the follow- 
ing useful sufficient conditions. A positive definite matrix A € M, is a- 
nonsingular Hermitian matrix such that x*Ax > 0 for all nonzero x € Cc", 
an equivalent condition on a Hermitian matrix A is that all the eigen- 


values of A are positive or that A=H 2 for some nonsingular Hermitian 
matrix H (see Chapter 7). 


4.6.12 Corollary. Let A, Be M, with A Hermitian and positive definite. - 


(a) If B is Hermitian, then there exists a nonsingular Se M, such - 


that SAS* =I and SBS* is real diagonal. 

(b) If B is symmetric, then there exists a nonsingular Se M, such 
that SAS* =I and SBS’ is real diagonal with nonnegative main 
diagonal entries. 


4.6 Consimilarity and condiagonalization 251 


Proof: Let A= H?, where He M,, is a nonsingular Hermitian matrix. 
(a) C=A7'B=H~’B, so C is similar to HCH ~™' = H(H~*B)H™'= 

H~'BH™, which is Hermitian and hence is diagonalizable with real eigen- 

values; the matrix C must also be diagonalizable with real eigenvalues. 

Thus, A and B are simultaneously diagonalizable by *congruence by 

I(b)(2) of (4.5.15). If H~'BH~'= UAU* with U unitary and A diagonal, 

the nonsingular matrix S= U*H™! will make SAS*=/ and SBS*= A. 
(b) C=A7'B=H~’B, so C= H~’ BA ~’B is similar to 


H(CC)H'=H"'BH~’BH"'=(H'BA~')(H~'BA™!)* 


which is Hermitian and positive semidefinite and hence is diagonalizable 
with nonnegative eigenvalues. But 


rank(CC) =rank(H~'BH™')(H~'BA~')*=rank(H~'BA~') 


by (0.4.6d), and rank(H ~'BA~') =rank(H~*B) =rank C by (0.4.6b). 
By (4.6.11), therefore, condition III(b)(1) of (4.5.15) is satisfied and 
so there must be a nonsingular matrix S e M, such that SAS* and 
SBS’ are both diagonal. Observe that HC(H~')’= H(H~*B)(H7')"= 
H~'B(H~")’ is symmetric, so by (4.4.4) there is a unitary matrix Uanda 
nonnegative diagonal matrix E such that H~'B(H7')’=UZU’, or 
(U*H~')B(U*H!)"=Y. If we set S=U*H™, then S*AS =T as well. 
C] 


We have considered consimilarity to a diagonal matrix, but not every 
matrix is condiagonalizable, and it is natural to ask whether there is some 
simple form that any matrix can be reduced to under consimilarity. 
There is a normal form under consimilarity that plays a role analogous to 
the Jordan form for ordinary similarity. Using it, one can show that for 
each A e M,, A is consimilar to A, A*, and A7 [compare with (3.2.3)], A 
is consimilar to a Hermitian matrix [compare with (4.4.9)], A is con- 
similar to a real matrix, and there are nonsingular symmetric matrices 
5, 52€ M, and Hermitian matrices H,, Hye M, such that A = S.H, = 
HS, [compare with Corollary (4.4.11)]. In fact, one can reduce the 
whole question of consimilarity to more familiar notions: Two matrices 
A, BeM, are consimilar if and only if (a) AA is similar to BB, and 
(b) rank A = rank B, rank AA = rank BB, rank AAA = rank BBB, ..., and 
so on for all n such alternating products with at most n terms. 


Problems 


1, Show that consimilarity is an equivalence relation on M,. 


2. Provide the details for a proof of Theorem (4.6.3). 


252 Hermitian and symmetric matrices 


3. Let A € M, be given, and let \ be a coneigenvalue of A. Show that the 
set of coneigenvectors of A corresponding to ) is not necessarily a sub- 
space of C” over C but is always a subspace over R. Contrast with the 
situation for the ordinary eigenvectors of A. 


4. Theorem (4.6.11) gives necessary and sufficient conditions for a single 
matrix to be condiagonalizable, but what if one has several matrices that 
are to be condiagonalized simultaneously? Let {A 1, A, «s+, Ag] CM, be 
given and suppose there is a nonsingular Se M, such that A; =SA;57! 
for i=1,...,k and each A; is diagonal. Show that (a) each A; is condiag- 
onalizable: (b) each A;A; is diagonalizable; (c) the family of products 
{A;A;:i, j=1,...,k} commutes; and (d) A; A; +A; ‚A; has only real 
eigenvalues and A; A; ~A; A; has only imaginary eigenvalues for all 
i, j=l,...,k. What does this say when k=1? In fact, these necessary 
conditions are also sufficient; for a proof see the paper of Hong and 
Horn referenced at the end of Section (4.5). 


5. The matrix AA plays an important role in the theory of consimilarity. 
Show that for any AeM,, the characteristic polynomial of AA has 
real coefficients and conclude that any complex eigenvalues of AA must 
occur in conjugate pairs. Hint: det(tA— AAA) = det A det(tI— AA) 
det(t]-AA) det A. Thus, if A is nonsingular, the characteristic poly- 
nomials of AA and AA =(A4A) are the same. Consider A,=A+el for 
the general case. See Problem 8 for a more specific result about AA. 


6. The nonnegative eigenvalues of AA lead to coneigenvalues of A, but 
any eigenvalues of AA that are not nonnegative also have a significance. 
Suppose Ae M, and AX =x for some x #0 and Ae C such that \¢ 
[0, œ). Let ae C be any square root of \ and define the vector y by A¥= 
ay. Show that AJ = ax, AAy=)}y, and the vectors x and y are indepen- 
dent. Hint: If they are dependent, x must be a coneigenvector and 20. 
Conclude that all the complex eigenvalues of AA must occur in conjugate 
pairs and that any negative eigenvalue of AA must have geometric multi- 
plicity at least two. Compare with Problem 5. 


7. Let Ae M,, and suppose \ is a real strictly negative eigenvalue of AA, 
AAx =x, x #0, a? =, A¥=ay, AY =ax. According to Problem 6, x 
and y are independent. (a) Let x’=x+ fy, y’=y—x. Show that AX’ = 
ay’ and Aj’ = ax’ for any choice of 8 e C. (b) Show that B can be chosen 
so that x’ and y’ are orthogonal, and make such a choice for . (c) Let 
s>0O be such that £=sx’ is a unit vector, and let 7=sy’. Show that 
Ai =n, Aĵ =at, and &*y =0. (d) Let r >0 be such that ry is unit vector 
and let U = [y ry uz «+ un] € M, be unitary. Show that 


4.6 Consimilarity and condiagonalization 253 


0 re 


y Mi * 
mar a/r , 
U*AU = j--------- a with A’eM,,_> 
0 | 4 
i 
and hence that 
; _ 
HE 
U*(AĀ)U = |----2 ~~~ 
0 AA’ 


(e) Conclude that every negative eigenvalue of AA has even algebraic 
multiplicity. Compare with Problem 6. 


8. For any Ae M,, show that 


for ollo ajla aal 


Conclude from this explicit similarity that there is a one-to-one corre- 
spondence between the Jordan blocks of AA and AA that have nonzero 
eigenvalues. Since AA = AA, show that the Jordan blocks of AA with 
complex eigenvalues occur in conjugate pairs. Conclude that AA is sim- 
ilar to a real matrix for any Ae Mm. Hint: See the discussion of the real 
Jordan form in (3.4). Somewhat more is actually true. In fact, AA is 
always similar to the square of a real matrix. What does this imply about 
the eigenvalues of AA? 


9. If Ae M, is similar to a real matrix, show that A is similar to A (and 
conversely). Use this fact and Problem 8 to show that although AB need 
not be similar to BA in general, nevertheless AA is always similar to 4A 
for any Ae M,,. 


10. Show that the set of condiagonalizable matrices in M,, includes the 
following: (a) All real diagonalizable matrices with only real eigenvalues. 
(b) All diagonalizable matrices with a linearly independent set of 7 real 
eigenvectors. (c) All symmetric matrices. (d) All positive definite Her- 
mitian matrices. Hint: A= HH = H(HH"')A~' if A is positive definite; 
H is Hermitian and nonsingular. (e) All matrices of the form AB, where 
A is positive definite Hermitian and B is symmetric. This is the same as 


254 Hermitian and symmetric matrices 


the set of all matrices of the form H *B with H Hermitian nonsingular 
and B symmetric. Hint: H?B = H(HBH')A ~. 


11. Show that the set CD, of condiagonalizable matrices in M, has the 
following properties: (a) If Ae CD, and S eM, is nonsingular, then 
SAŠ! e CD,„. (b) The zero matrix is in CD,. (c) If Ae CD, and aeC, 
then aA € CD,. (d) If A € CD, is invertible, then A~' € CD,. 


12. Show that (a) lo ; | is not diagonalizable in the ordinary sense but it 


is condiagonalizable. (b) [| =i] is diagonalizable in the ordinary sense 


but is not condiagonalizable. (c) lo | is neither diagonalizable nor con- 
diagonalizable. 


13. If AEM, is such that AA=A=AyIn,®-+- Oddy with 7 #d, if 


ij, and all );=0, show that there is a unitary Ve Mn such that A= 
UAU' and A=A,@-::@Ax, where each A; € Mn; is upper triangular. 


14. Lemma (4.6.9) says that A € M, has the factorization A = S§~! for 2 
some nonsingular S € M, if and only if AA =I. Use (4.4.4) to show that 
A=UU~!=UU’ for some unitary Ue M, if and only if A7 =A and A © 


is symmetric. What does this have to do with (4.4.7 )? 


15. Let AEM, and write A=B+iC with B, CeM,(R). Show that 
\ e Cis a coneigenvalue of A if and only if + |\]| are (real) eigenvalues of 
the block matrix 


B C 
r=|6 fp | EMaR) 


Hint: Write AX =rx in terms of x =u+iv, u,v €R”, r=]|\|. Thus, if F 


has no real eigenvalues, A can have no coneigenvalues. 


16. Show that if A € M, is diagonal or upper triangular, then the eigen- 


values of A and the coneigenvalues of A are “the same” in the following : 
sense: If \ is an eigenvalue of A, then eN is a coneigenvalue of A for all l 
eR, and if p is a coneigenvalue of A, then e" pis an eigenvalue of A for ` 


some OER. 


17. If Ae M,,(R), show that every real eigenvalue of A is also a con 


eigenvalue of A and that if p20 is a coneigenvalue of A, then either - 


u or —p is an eigenvalue of A. Hint: Write AX =px in terms of x= 
01 


utiv, u,v eR". Consider the example A= |i | to show that a real - 


matrix can have nonreal eigenvalues that are not associated with an 
coneigenvalues. 


4.6 Consimilarity and condiagonalization 255 


18. What does Lemma (4.6.9) say when n =1? A complex number z lies 
on the unit circle in the complex plane if zZ = 1. The usual generalization 
of this condition to matrices is to require that 4 A* = I; such matrices are 
unitary and play a fundamental role in matrix theory. Another generali- 
zation (which reduces to the same thing when 7 =1) is to require that 
AA =I, and it is these matrices that Lemma (4.6.9) characterizes as being 
consimilar to the identity matrix. Show that if A= M, and AA =, then 
(a) A is nonsingular; (b) A~' =A; (c) |det A] = |M ---,|=1; and (d) if 
Ax =x and x #0, then AX = (1/h)X; so 1/X is an eigenvalue of A when- 


ever à is an eigenvalue of A. Show that the matrix B= [ 5 i], zeR, z+ 


+1, has the property that the spectrum of A = BBT! is 
z-1 z+l 
area 
so not all eigenvalues of such matrices lie on the unit circle. 


19. It is a fact that every complex matrix Ae M, can be written as 
A = RE, where R, E e Mp, Ris similar to a real matrix, and EE = I. Show 
how this decomposition follows from the fact that every A e M, is con- 
similar to a real matrix and explain how it generalizes the fact that every 
complex number z can be written as z =re” with r and 6 real. 


20. Show that Theorem (4.6.11) follows from the general necessary and 
sufficient conditions for two matrices to be consimilar that were stated in 
the last paragraph of the text of this section. Hint: Apply the conditions 
to A and a diagonal matrix A. 


21. Use the fact that every A e M, is consimilar to a real matrix to show 
that A must have at least one coneigenvalue if n is odd. Hint: A real matrix 
R of odd order has at least one real eigenvalue. What does this say about 
the eigenvalues of R°? If A is consimilar to R, how is AA related to R°? 


22. Let Ae M, be symmetric. The discussion after Theorem (4.6.11) 
shows that A is condiagonalizable, so there exists a nonsingular Se M, 
and a diagonal Ae M, such that A =SAS~'. Show that one may take S 
to be unitary [and hence deduce Corollary (4.4.4) from Theorem (4.6.11)] 
as follows: Observe that the symmetry of A implies that (S*S)A= 
A(S*S) = A(S*S)/. Use the polar decomposition (7.3.3) to write S= 
UP, where Ve M, is unitary, Pe M, is Hermitian, and P = p(S*S) for 
some polynomial p(t) [see the proof of Theorem (7.2.6)]. Deduce that 
PA=AP=AP' and hence SAST! = VAUT. 


Further Readings. For more information about consimilarity and the 
problem of simultaneous condiagonalization of a family of matrices, see 


256 Hermitian and symmetric matrices 


the papers of Hong and Horn referenced at the end of Sections (4.4) and 
(4.5) as well as Y. P. Hong and R. A. Horn, “A Canonical Form for Ma- 
trices under Consimilarity,” Linear Algebra Appl. 102 (1988), 143-168. 
The notion of consimilarity can be generalized by replacing the complex 
field with an arbitrary field and replacing the operation of complex con- 
jugation by an automorphism on the field; see [Jac], p. 27. 


CHAPTER 5 


Norms for vectors and matrices 


5.0 Introduction 


If one has several vectors in C” or several matrices in M,, what might it 
mean to say that some are “small” or that others are “large”? Under what 
circumstances might we say that two vectors are “close together” or “far 
apart”? 

Questions of “size” and “proximity” in a two- or three-dimensional real 
vector space usually refer to Euclidean distance. The Euclidean length of 
a vector ze R” is (zz)? =(9 z?)'”?, and z is said to be “small” (with 
respect to this measure) if this nonnegative real number is small. The 
vectors x and y, furthermore, are “close” if the Euclidean length of the 
difference z =x— y is a small number. 

What may be said about the “size” of matrices, which may be thought 
of as vectors in a higher-dimensional space? What about vectors in 
infinite-dimensional spaces? What about complex vectors? Are there 
useful ways to measure the “size” of real vectors other than by Euclidean 
length? 

One way to answer these questions is to study norms, or measures of 
size, of matrices and vectors. Norms may be thought of as generaliza- 
tions of Euclidean length, but the study of norms is more than an exer- 
cise in mathematical generalization. It is necessary for a proper formula- 
tion of notions such as power series of matrices, and it is essential in the 
analysis and assessment of algorithms for numerical computations. Fur- 
thermore, different acceptable norms may be more or less convenient in 
various situations, so that it is appropriate to study properties common 
to all norms, rather than to restrict attention to any single norm. 


257 


258 Norms for vectors and matrices 


The following examples indicate a few ways in which the need for 
norms arises. 


5.0.1. Example (convergence). If x is a complex number such that 
|x| <1, we know that 


(l—x)}=1l+x4x7 4x7 4-5 
This suggests the formula 
(I-A) =I+A4 +A? +A? +> 


for calculating the inverse of the square matrix J—A, but when is it 
valid? It turns out that it is sufficient that a matrix norm of A be less than 
1, and any such norm will do! Similarly, many other power series which 
can be used to define matrix-valued functions of a matrix, such as 
Ax dl AK 
l 5 Ak! 

can be shown to be convergent and well-defined using norms. Norms 
may also be useful in determining the number of terms required in a 
power series in order to calculate a particular function value to a desired 
degree of accuracy. Similar remarks may be made about the analysis of 
convergence of iterative schemes to solve systems of equations. 


5.0.2 Example (accuracy). If f is a real scalar-valued differentiable 
function of a real variable, we know that if the value of f(x) is known for 
x = Xo, then its value at nearby points x = Xo + A can be estimated in terms 
of the first derivative 


S(xoth)—f(xo) Afp 
rr an Ax = f'(Xo) 


Thus, we have a way of estimating the relative error in computing the 
value of f at Xo if we actually compute the value of f at a nearby point 
Xo +A instead. l 

The same issue arises for matrix calculations. Suppose we wish to 
compute A`! (or some other function of A), but the entries of A are 
obtained by experiment, by analysis of other data, or from prior calcula- 
tion, and they are not known exactly. We may think of A as being com- 
posed of the “true” Ag plus an error E, and we would like to assess 
the potential “relative error” (in terms of the “size” of E) in comput- 
ing A~!=(Ay+E)~! instead of the true Ap '. Bounds for the disparity 


5.1 Defining properties 259 


between A~! and Ag! may be as important to know as the exact value of 
A7!, and norms provide a systematic way of dealing with such questions. 


5.0.3 Example (bounds). Bounds for important quantities associated 
with a matrix, such as eigenvalues, often involve norms, as do bounds 
for possible changes in these quantities when a matrix is perturbed. 


5.1 Defining properties of vector norms and inner products 


We first consider norms on a vector space. Since M, is a vector space, 
everything we do will also apply to norms of matrices. 

In order to specify properties to be required of a function if it is to bea 
norm, we abstract from the familiar notion of absolute value of (real or 
complex) scalars. Of course, a significant difference is that, while the 
absolute value function is a real-valued function of one real or complex 
variable, we require a norm to be a real-valued function of the several 
variables that describe a vector. One such function on C” is Euclidean 
length (z*z)/ 2. but there are other functions that share some funda- 
mental properties of Euclidean length and may be more relevant measures 
in some instances, may impart additional information, or may be more 
convenient to use in certain contexts. 

Throughout this chapter we shall consider real or complex vector 
spaces only. All of the major results hold for both fields, but within each 
result one must be consistent as to which field is used. Thus, we shall 
often state results in terms of a field F (with F = R or C at the outset) and 
then refer to the same field F in the rest of the argument. 


5.1.1 Definition. Let V be a vector space over a field F (R or C). A 
function |e]: VR is a vector norm if for all x, ye V, 


(1) |x/2=0 Nonnegative 
(la) |x| =0Oif and only if x=0 Positive 
(2) |ex|=|c| |x] for all scalars ce F Homogeneous 


GB) [x+y] sixi+lyI 


These four axioms are familiar properties of Euclidean length in the plane. 
Euclidean length possesses other properties that are independent of these 
four axioms [e.g., the parallelogram identity (5.1.8)], which we do not 
adopt as axioms because they are not essential to the general theory. 

A function that satisfies axioms (1), (2), and (3), but not necessarily 
(la) is called a vector seminorm. A seminorm generalizes the notion of a 


Triangle inequality 


260 Norms for vectors and matrices 


norm in that some vectors other than the zero vector are allowed to have 
zero length. 


y*Dx. 


5.1.2 Lemma. If |+| is a vector seminorm on V, then 


Ixl- xi =x- 


(a) 

for all x, ye V. (b) 
Proof: Since y=x+(y—x), we have oO 
lis |x| +|y—x] = bx] + le—-y] (e) 


from the triangle inequality (3) and the homogeneity axiom (2). From 
this it follows that 
[yl — [x] s [x-y] 
But x =y+(x—y) as well, so we have 
Ixi sly] +|x—- | 


from the triangle inequality (3) again, and hence 


Ix|-[vfslx-yl, 


Thus, we have shown that +(|x|—|y])<|x—y]|, which is equivalent to 
the assertion of the lemma. O 


Associated with Euclidean length on C” is the usual Euclidean inner 
product y*x (sometimes called the “dot product”), which has something 
to do with the “angle” between two vectors: x and y are orthogonal if 
y*x =0. Just as for vector norms, one can abstract a few essential charac- 
teristics of the Euclidean inner product and use them as axioms for a 
general theory of inner products. 


5.1.3 Definition. Let V be a vector space over the field F (R or C). A 
function <», +): Vx V—-F is an inner product if for all x, y,ze V, 


(1) <x,x>20 Nonnegative 
(la) <x,x}=0 if and only if x=0 Positive 

(2) +y, 2) =X, 2+ (9,2) Additive 

(3) <cx, y)=cx, y} for all scalars ce F Homogeneous 


Hermitian property 


(4) <x, yo=<y,x) 


Exercise. Show that the Euclidean inner product <x, y) = y*x satisfies all 
four of the above axioms for an inner product. 


Exercise. Let D = diag(d,, d,. 


(Re<x, y))* <x, x)¢y, y) 


Since this inequality must hold for any pair of vectors, it must hold if yis 
replaced by (x, yyy, so we also have the inequality 


(Redx, (x, yy)? sO xy, yyh, yy)? 
But Recw, (x, yy) = Rex, y)<x, y) = Rex, y)/? = |x, y)|?, so 
Kx,» < x, xX, yyl, y)? 


If <x, y) =0, then the statement of the theorem is trivial: if not, then 
we may divide (5.1.6) by the quantity |<x, yy? to obtain the desired 


E 


5.1 Defining properties 261 


--,d,) and consider the function (x, y)= 


Which of the axioms for an inner product does («, +) satisfy? 
Under what conditions on D is (*,*) an inner product? 


Exercise. Deduce the following properties of an inner product from the 
four axioms in Definition (5.1.3): 


(x, CY) = C{x, y} 

x, V+Z)= (x, y) +x, z) 

(ax + by, cw+ dz) = atx, w) + bey, w) + adx, z) + bd<y, z) 
(x, y)=0 for all ye V if and only if x=0 

«x, (x, yyy = Kx, yy)? 


An important property shared by all inner products is the Cauchy- 
Schwarz inequality. 


5.1.4 Theorem (Cauchy-Schwarz inequality). If (+, +} is an inner prod- 
uct on a vector space V over the field F (R or C), then 


Kx, y? <x, xy, yy for all x, yeV 


Equality occurs if and only if x and y are linearly dependent, that is 
x=ay or y=ax for some we F. 


Proof: Let x,yEV be given. If y=0, the assertion is trivial, so we 
may assume that y#0. Let te R and consider P(H=AX+t,x+ty= 
(x, X)+ EY, X+ Elx, y) + t><y, Y) =(X,X)+2t Retx, y)+ tO, Y), which 
is a real quadratic polynomial with real coefficients. Because of axiom 
(5.1.3(1)), we know that p(t) = 0 for all real t, and hence p(t) can have no 
real simple roots. The discriminant of p(t) must therefore be nonpositive 


(2 Rex, y))?—4¢y, yx, x) <0 


and hence 


(5.1.5) 


(5.1.6) 


5.1 Defining properties 263 


262 Norms for vectors and matrices 
4. Show that any vector norm derived from an inner product as in (5.1.7) 
must satisfy the parallelogram identity 

ally P+ x-y P= Jaf? + fy? (5.1.8) 


Why is this identity so named? The equation (5.1.8) is, in fact, necessary 
and sufficient that a given norm |+| be derived from an inner product. 
See Problem 10. 


5. Consider the function |x| = max) <;<, |x;| defined on C”. Show that 
‘|æ is a vector norm that cannot be derived from an inner product. 


inequality. Because of axiom (la), p(t) can havea real (double) root only 
if x+ty =0 for some ¢. Thus, equality can occur in the discriminant con- 
dition (5.1.5) if and only if x and y are linearly dependent. © 


5.1.7 Corollary. If <*,+) is a vector inner product on V, then |x/= 
(<x, x)? is a vector norm on V. 


Exercise. Prove (5.1.7). Hint: The only nontrivial axiom to verify is 
the triangle inequality. Compute |x+ |? and use the Cauchy-Schwarz 
inequality. 


sera 


6. If |e] is a vector norm derived from an inner product <e, +}, show that 
Rex, y) =F (lx+? -x-y (5.1.9) 
This is known as the polarization identity. Show also that 

Re(x, y)= 3 (lx +y- lx- y) 


. Show that the /; norm |x] = |xi|+ --- +|x,]| on C” satisfies the axioms 
5.1.1) but does not obey the polarization identity (5.1.9). It is not, there- 
ore, derived from any inner product. 


If |e] is a vector norm such that |x| =x, x)? for some inner product 
{e,*)>, then we say that the vector norm |e] is derived from an inner 
product (namely, from <», *)). 


Problems 


1. Let e; denote the ith unit coordinate vector in C” and suppose that |+| 


: ; n 
is a vector seminorm on C”, Show that , l 
. If |+| is a vector norm on V derived from an inner product, then 


[x+y] xy] s bx? + Dy? 


or all x, ye V. When does equality hold? Does this inequality hold for 
I vector norms? Give a geometric interpretation of this inequality. 


Ixl < x Ix; leil 


2. If |+| is a vector seminorm on V, show that Vo= {ve V: |v] =0} is a 
subspace of V (called the null space of |+]|). (a) If Vi is any subspace of V. 
such that oN V; = {0}, show that || is a vector norm on V,. (b) Consider 
the relation x ~ y defined by 


pany 


9, Let x and y be given vectors in V, which has a norm |+| derived from 
an inner product <e, *), and suppose that y is nonzero. Show that the 
. . scalar œo that minimizes the value of |x—ay]] is ap =x, yyy’, and 
x~y ifand only if |x-y|=0 ‘that X—aoy and y are othogonal. 
Show that ~ is an equivalence relation on V, that the cosets of this equiv-: 
alence relation are of the form x = {x+y e V: ye Vo}, and that the set of. 
these cosets forms a vector space in a natural way. Show that the func- 
tion |x| = {|x|:x ex} is well defined and is a vector norm on the vector 
space of cosets. (c) Explain why there is a natural norm associated with. 
every vector seminorm. (d) Is |x] =0 a seminorm? (e) Give an example of- 
a nontrivial seminorm that is not a norm. 


| 0. It is not difficult to show that the parallelogram identity (5.1.8) is a 
ficient condition for a given norm to be derived from an inner product, 
but some ingenuity is required. First consider the case of a vector space V 
er R. Let |e] be a given norm on V. (a) Define 
lx +y? -lx -ly 
2 

Show that <e, «) defined in this way satisfies axioms (1), (la), and (4) in 
(5.1.3) and that <x, x) = |x|’. (b) Use (5.1.8) to show that 


4x, y) +4, y) = 2) x+y]? +2] 2+y/?—2| x]? —2z)?—4] y]? 
= |x+2y+z)?—|x+z2)?—4] y?? =4¢x4z, p> 


and conclude that the additivity axiom (2) in (5.1.3) is satisfied. (c) Use 
the additivity axiom to show that (nx, y) = nlx, y) and m(m7'nx, y} = 


(x, Y= (5.1.10) 


3. Show that if we define the “angle” between the nonzero vectors x and 
y to be the value of 


cos = ( Kx, y)| ) 
(<x, x><y, y»)? 


that lies between 0 and 2/2, then this notion of angle is well defined for: 
any inner product. 


264 Norms for vectors and matrices 5.2 Examples of vector norms 265 


(nx, y) = nx, yy} whenever m and n are nonnegative integers. Use (5.1.8 
and (5.1.10) to show that (—x, y) = — x, y) and conclude that (ax, y)= 
alx, yy whenever ae R is rational. (d) Let p(t) = t?|x|?+2¢¢x, y+ |p|? 
te R, and show that p(t) =|tx+ |? if t is rational. Conclude from th 
continuity of p(t) that p(t) = 0 for all te R. Deduce the Cauchy-Schwar 
inequality |<x, y)? < |x|? | y|? from the fact that the discriminant of p(t 
must be nonpositive. (e) Now let a e R be given. Show that 


is also derived from the usual Euclidean inner product; that is, |x| = 
(x, XD) = x*x. 


Exercise. Verify that |*}2 is a vector norm on C”, 


Exercise. A norm || is said to be unitarily invariant if |Ux| = |x] for all 


n , . 
xec and all unitary matrices Ue M,„. Show that the Euclidean norm 
[elz is unitarily invariant. 


Kax, y)— alx, y)| = |((a—b)x, y+ (bD—a)Xx, y)| 
<|{(a—b)x, y>| + |(b—a){x, y| s2|a—5| |x| lx] 


for any rational b, and observe that the upper bound can be made arbi 
trarily small. Conclude that the homogeneity axiom (3) in (5.1.3) is satis 
fied. This shows that <», «> is an inner product on V. 

A careful reader will observe that the triangle inequality for the norm 
|e] [axiom (3) in (5.1.1)] is not used in this argument. Thus, the axioms 
(1), (1a), and (2) in (5.1.1) together with (5.1.8) imply that the function 
|e] is derived from an inner product, is therefore a norm, and hence must 
satisfy the triangle inequality. (f) If V is a complex vector space, define 


|xt+y[?-lxeP- DP? 4 i(lxt+iy?—|xP-lyP) 
2 2 
The real part of <x, y) is an inner product of V considered as a vector 


space over R. Use this fact and (5.1.8) to show that <«, +) is an inner 
product for V as a vector space over C. 


5.2.2 The sum norm (or l, norm) on C” is 


lxh = [xi] +--+ + [xn 


This norm is also called the one-norm or, more picturesquely, the Man- 


hattan norm because of the rectilinear measurement of length in coor- 
dinate directions only. 


Exercise. Verify that the sum norm is a vector norm on C”, but that it is 
not derived from an inner product. Hint: Use (5.1.8). 


(x, Y= 
5.2.3. The max norm (or l» norm) on C” is 


[xlo = max{|x;|,..., xnl} 


Exercise. Verify that [e| is a vector norm on C”. 


Further Reading. The first proof that that parallelogram identity is both 
necessary and sufficient for a given vector norm to be derived from 
an inner product seems to be due to P. Jordan and J. Von Neumann, 
“On Inner Products in Linear Metric Spaces,” Ann. Math. 36(2) (1935), 
719-723. The outline of a proof of this result given in Problem 10 follows 
D. Fearnley-Sander and J. S. V. Symons, “Apollonius and Inner Prod- 
ucts,” Amer. Math. Monthly 81 (1974), 990-993. 


Exercise. Is |e] derived from an inner product? 


5.2.4 The /, norm on C” is 


k=( $ D 


5.2 Examples of vector norms rci ; . 
Exercise. Verify that each lp norm for p=1 is a vector norm on C” and 


pat |X]. = lim psolX| p for each xe C”. Hint: The triangle inequality is 
e only nontrivial axiom to verify. The triangle inequality for the / 
orms Is a classical inequality known as Minkowski’s inequality, ° 


The following are some examples of frequently encountered vector norms. 


5.2.1 The Euclidean norm (or h norm) on C” is 
ercise. Give an example of a i 

lxl2= (x+ + EASKE p vector norm that is not an /, norm. 

This is perhaps the best known vector norm since |x—y]2 measures the 


The foregoing examples of ve 
. : 3 ctor norms have all been norm n 
standard Euclidean distance between two points x, ye C”. This norm ensional 


ut they can be used to create vector norms on any finite-dimensional 


5.2 Examples of vector norms 267 


266 Norms for vectors and matrices 
real or complex vector space V. If ®={b",...,b“} is a basis for V, _ i’ p 1/p 
then recall that ` Ilp = a 10] dt ; p2l Ly norm 
%1 n [IJlo = max{| f(x)|: xe [a,b]} Lo norm 
x>[x]jge=a|: eC, x= DD xb 
x i=l are all norms on C[a, b]. 
n 


is an isomorphism of V onto C”. If |e] is any vector norm on C”, then Problems 


1. Show that if 0< p<1, then (5.2.4) defines a function on C” that satis- 
fies all but one of the axioms for a vector norm. Which one fails? Give an 
example. 


2. Let fe C[0, 1]. Show that | f].=lim,0{/Ip. 


3. What does the triangle inequality look like for |+|,, on C[0,1]? How 
could you prove it starting from Minkowski’s inequality (Appendix B) 
for C”? 


Ixile = lixlel= [ixn -x h x= PETI 


is easily shown to be a vector norm on V. 


Exercise. Verify the last assertion. 


A matrix Be M, is said to be an isometry for the vector norm |«| o 
C” if 
pan n 
|Bx|=]|x| forall xeC 4. Let Pis, p2,..., Pn be given positive real numbers. Which of the fol- 


lowing is a vector norm on C”? 


Exercise. Show that an isometry for any vector norm must be a non 
singular matrix. 

Exercise. Show that the set of isometries for a given norm forms a group 
(known as the isometry group of the norm). Are there any isometries fo 
|e |2 besides the unitary matrices? 


(a) |x|/= x pi |xi| 


n 1/2 
© bxl=(¥ pili?) 


Exercise. Show that the isometry group of the sum norm is the set (group 
of all matrices that look like permutation matrices except that the “+r 
entries are replaced by arbitrary complex numbers with absolute value | 


(c) |x| =max{p;|x1|, vey Pn |Xnl] 


5. Let xo €[a, b] be a given point. Show that the function | f |x, = | f(xo)| 


Exercise. What is the isometry group of the max norm? on C[a, b] is a seminorm that is not a norm if a < b. 


6. If |e] is an unitarily invariant vector norm on C”, show that |e] = 
ale|2 for some a >0 and that |]. is the only unitarily invariant vector 
norm for which |e,|=1. 


The definition of a vector norm does not require that the vector space 
V be finite-dimensional. The space V might, for example, be the vecto 
space C[a, b] of all continuous real- or complex-valued functions on the 
real interval [a, b]. 


7. Show that |X| = maxj,),=1|y*x| and that |x|;=max),)_ —;|x*y]. 


8. Use the preceding exercise to show that if A* is in the isometry group 
of the sum norm, then A is in the isometry group of the max norm, and 
vice versa. 


5.2.5 Example. Some examples of norms on C[a,b] are similar to 
norms already defined for C”. For example, 


b 1/2 
Ifh= D |f a] L, norm 9. What is the intersection of all the isometry groups of all the lp norms? 
pb Further Readings. For a detailed discussion of the classical inequalities 
h= p | f(2)| dt Lı norm of Minkowski and Hölder, see [BB]. 


268 Norms for vectors and matrices 


5.3 Algebraic properties of vector norms 


From any given norm or norms, new norms may be constructed in sev- 
eral ways. For example, it is easy to show that the sum of two vector 
(semi)norms is a vector (semi)norm and any positive multiple of a vector 
(semi)norm is again a vector (semi)norm. In a different vein, one can also 
show easily that if |+|,, and |+|, are vector norms, then the function |¢| 
defined by |x| =max{|x|,, |x|} is also a vector norm. These observa- 
tions are all special cases of the following result. 


5.3.1 Theorem. Let |¢|.,,.-., |*|a,, be m given vector norms ona vec- 
tor space V over the field F (R or C), and let |e]; be a vector norm on R 
such that | y| = |y +z|g for all vectors y, z e R” with nonnegative entries, 
Then the function |+|:V—R defined by |x] =Jixla -> Xlam] lg is a 
vector norm on V. 


The monotonicity assumption on the norm ||, in the theorem ensures . 
that the constructed function |+| satisfies the triangle inequality. All the . 


l, norms have this monotonicity property, as does any vector norm |x|, 
on R” that is a function only of the absolute values of the entries of x; 
see (5.5.9-10). There are, however, vector norms that do not have this 


LARP DED tO A RSD SUE 


5.4 Analytic properties of vector norms 269 
Exercise. What happens in (5 .3.2) if T is singular? 
Exercise. Why must |x| = (|2x,—3x2|?+|x2|7)”? be a norm on C2 (no 
computations, please!). 


New norms can be constructed from old ones by using the notion of 
duality. This method is discussed at the end of Section (5.4). 


Problems 


1. If [+] is a vector seminorm, show that |x|7=|Tx| is also a vector 
seminorm for any Te M,,. If |e] is actually a vector norm, then ||; is a 
vector seminorm whose null space is the null space of T. 


2. Show that any vector seminorm is of the form || for some vector 
norm [e| and some Te M,,. 


5.4 Analytic properties of vector norms 


The examples in the preceding two sections make it clear that there are 
many different functions |e]: V— R that satisfy the axioms for a norm. It 
is useful to have many different norms available because one norm may 
be more convenient or more appropriate than another for a given pur- 


property. pose. For example, the /, norm is often convenient to use in optimization 
problems because it is continuously differentiable (except at the origin). 
On the other hand, the /, norm, while differentiable on a smaller set, is 
popular in statistics because it leads to estimators that can be more 
robust than the classical regression estimators. The /,, norm is often the 
most natural one to use, since it directly monitors element-by-element 
convergence, but, unfortunately, it can be analytically and algebraically 
awkward to use. In actual applications, the norm on which theory is 
most naturally based and the norm that is most easily calculated in a 
given situation may not coincide. It is important, therefore, to know 
what relationship there may be between two different norms. Fortu- 
nately, in the finite-dimensional case all norms are “equivalent” in a 
certain strong sense. 

A basic notion in analysis is that of convergence of a sequence, and vec- 
tor norms can be used to measure convergence of a sequence of vectors. 


Exercise. Prove Theorem (5.3.1). 


Exercise. Show that the fact that the sum or max of two vector norms is a 
vector norm is a special case of (5.3.1). What about the min? 


Exercise. Let m = 2, V = R?, and |x| = |x, — x2| + |x2l. Show that 
|e]g is a vector norm on R° but the function |x| = | xlo, lxh] lg 
min{|x;,|, |x2|} + |x,| +|x2| is not a vector norm. Which of the vector 


. 0 
norm axioms does |+| satisfy? Hint: Consider x = lo] y= [ i | » |x+yI, 
|x|, and |v]. Does this contradict (5.3.1)? 


Another way to construct new norms from old is given by the follo 
ing result. 


5.3.2 Theorem. If || is a vector norm on C” and if T e M, is nonsingu- 


5.4.1 Definition. Let V be a vector space over R or C and let [e| bea 
lar, then |+|7 defined by |x|r = Tx], x eC”, is also a vector norm on C”. 


norm on V. We say that the sequence {x} of vectors in V converges to 
a vector xe V with respect to the norm |e] if and only if |x“ —xj >0 


Exercise. Prove Theorem (5.3.2). as k > 00, 


270 Norms for vectors and matrices 5.4 Analytic properties of vector norms 271 


If {x} converges to x with respect to the norm |+|, we write 5.4.3 Lemma. Let |»| be a norm on a vector space V over the field F 
. ver the fie 


(R or C), and let x, x). x eV be gi i 
, , pees given vectors. The function 
g: F” >R defined by 


&(Z1, £25 eom) = IzpxPtzx + s. + ZX | 


x +x or limx“ =x with respect to |e] 
je k= 
It must be made clear which norm is involved in the convergence in 
question; the issue arises as to whether a given sequence of vectors can 
converge with respect to one norm but not with respect to another. This 
ambiguity can happen in an infinite-dimensional vector space. 


is a uniformly continuous function. 


. —_ Vn (i) j 
Proof: Let u = Efu; u;x® and v = DY", v;x?, and calculate 


[8 (ui, «++ Um) = 8V,- Um) | = | ful — lol] <]|u—v] 


5.4.2 Example. Consider the sequence {fk} of functions in C[0, 1] (the 
in 

=| È (uv) 
i=] 


vector space of all real-valued or complex-valued continuous functions 


on [0,1]) defined by <È |u;—v;| x |<C max |u;—v,| 


i=] lsism 


1 _— ; > . 

f(x) <0 0-x< 1 where C=mmaxi<is mix]. The first inequality comes from Lemma 
` Wi and Notice that the finite constant C depends only upon the norm 
| 3 . e m vectors x", ...,x. If the vectors x are all 

E y2 1 3 t re all the zero vec- 
fex =k" xk"), 7 sxs 7k tor, there is nothing to show, and, if not, then C>0. In order to have 
; > |8(th, ++, Um) = 8(Vi, -+ Um) | < €, we need only choose |u;—v;|<e/C. O 

Lar _p3/2 1/2 2 2 

fax) =2( -k x+2k"?), 7 <x< i Although the vector space V need not be finite-dimensional for the 
; lemma, it is important that the number of vectors x” be finite. 

(x) =0, —<sxsl i 
fe k Exercise. Deduce from the lemma that every vector norm on R” or C” is a 


uniformly conti i 
for k=2,3,4,.... One may then calculate that y tinuous function. 


Finite dimensionality of V is, however, essential for the following 


EAF =}1k V? >0 as k>% 
2 fundamental fact. 


Ifeb= for all k 


5.4.4 l Theorem. Let fı and f» be two real-valued functions on a finite- 
dimensional vector space V over the field F (R or C), and let G= 
y, ..., X] be a basis for V. Assume that jf, and f are 


(a) Positive: f;(x) 20 for allxeV, f,(x) =0 if and only if x =0; 


(b) Homogeneous: Si(ax) = la] f(x) for all eF and all xe V; and 
(c) Continuous: f;(x(z)) is continuous on F”, where 


ilo =k >o as k>% 


Thus, lim; fg =0 with respect to the L; norm but not with respect to 
the other two norms. 


Exercise. Sketch the functions described in the preceding example and 
verify the assertions made about the L4, L2, and Læ norms. Ton 

. Z=[Z1,...,Zn]°€F"” and x(z)=zx+--- +z," 
Then there exist finite positive constants Cm and Cy such that 
Cm Aix) £ fo(x) Ss Cu f(x) 


for all xe V. 


Exercise. If || is a vector norm, if x“? 2% and if x“) my use the 


triangle inequality to show that x = y. Thus it makes sense to talk about 
the limit of a sequence (if any) with respect to a given norm. 


Fortunately, the phenomenon in Example (5.4.2) cannot occur in the 
case of a finite-dimensional vector space. In order to see this, we need a 
general lemma about the continuity properties of norms. 


roof: Define h(z)= fo(x(z))/fi(x(z)) on the Euclidean unit sphere 
= (ze F”: |z|2=1}, a compact set in F”, Notice that the denominator of 


272 Norms for vectors and matrices 5.4 Analytic properties of vector norms 273 


A(z) does not vanish on S by (a), and therefore A(z) is continuous on § 
by (c). By the Weierstrass theorem (see Appendix E), the continuous 
function A achieves a finite positive maximum Cy and a positive min- 
imum Cp on the compact set S and hence 


Cm A S fo (xX(z)) S Cu Ai(X(Z)) 


for all z e S. Because z/ |z| € S for every nonzero z e F”, (b) ensures that - 
these inequalities hold for all nonzero z e F”; they hold trivially for z=0 ; 
since f;(0)=0. But every xe V is of the form x=x(z) for some ze F” - 
because & is a basis, so the asserted inequalities hold for al xe V. 0 


of vectors, then lim, x’ =x with respect to Jel, if and only if 
lim ~> Xx“ =x with respect to |e. 


Proof: Since Cr [x xla < fx —xlg £ Cy|x —x], for all k, it fol- 
lows that |x" — x] +0 if and only if |x“ —x],0ask>0, O 


5.4.7 Definition. Two norms are said to be equivalent if whenever a 
sequence {x} converges to a vector x with respect to the first norm, then 
it converges to the same vector with respect to the second norm. Thus 

(5.4.6) says that for finite-dimensional real or complex vector spaces, all 
vector norms are equivalent. We have seen in Example (5.4.2) that two 
different norms might not be equivalent on an infinite-dimensional space. 


Since all vector norms on R” or C” are equivalent to [+], we have 


Definition. Let V be a real or complex vector space. A function f: V>R ` 
that satisfies the three hypotheses of positivity, homogeneity, and conti 
nuity in Theorem (5.4.4) is said to be a pre-norm. 

The most important example of a class of pre-norms is, of course, the 


lim, 0 x‘? =x with respect to an i i 
vector norms; Lemma (5.4.3) says that every vector norm satisfies the “ wy p any vector norm if and only if 
continuity assumption (c) of Theorem (5.4.4). A pre-norm that satisfies Jim xX; =x; forall f=1,...,n 

-+ 00 


the triangle inequality is a vector norm. Because of the importance of this 


class, we state the result in this case as the following corollary. Componentwise convergence (with respect to any basis) is equivalent to 


convergence with respect to any vector norm. 

Another important consequence of equivalence of all vector norms in 
the finite-dimensional case is that the unit ball and unit sphere of every 
vector norm is compact. This fact implies that a continuous complex- 
valued function on the unit ball of any vector norm is bounded and that 
it achieves its maximum and minimum if it is real-valued. 


5.4.5 Corollary. Let |+|,, and |e] g be any two vector norms on a finite- 
dimensional real or complex vector space V. Then there exist finite posi- 
tive constants Cm and Cy such that Cmlxle < xls < Cmlxla for all 
xeV. 


Exercise. How does (5.4.5) break down for vector seminorms? 


Exercise. Let x = [x ,X2]’ e R? and consider the following norms on R°: 
[x Joe = | [10x1, x2] lo and xls = |[x1, 10x2]7]... Show that the function 
f(x) = (xla xls)" is a pre-norm on R? that is not a norm. See Problem 
15 at the end of this section. Hint: Consider f({1,1]"), f({0,1]”), and 
A, 017). 

Exercise. If elan., |*/a, are vector norms on V, show that f(x) 
Axla e xla) and A(x) =min{]x]a,,---, xla} are pre-norms on V 
that are not necessarily vector norms. ` 


sets [x: f(x) s1} and {x: f(x)=1} are compact. In particular, if |e] is a 
vector norm on V, then the closed unit ball {x: |x| <1} and the unit 


Proof: By (5.4.4) there is some C>0 such that Ix]2<C/f(x) for all 
xey, so the set {x: f(x) <1} is a bounded set, which is contained in an 
rdinary Euclidean ball of radius C centered at the origin. Both of the 
ets KOOSI and fx: f(x) <1} are closed because J(e) is continuous. 
ince a close in R” "i 

One consequence of (5.4.5) is the fact that convergence (in norm) of a ounded set in R! or C" is compact, we are done. D 
sequence of vectors in a finite-dimensional complex vector space is inde- 
pendent of the norm used. 


Often we are not confronted with the problem of determining whether 
given sequence {x} converges to a given vector x, but rather with 
etermining whether a given sequence fx“ } converges to anything at all. 
or this reason, one needs to have a convergence criterion that is inde- 
endent of the limit x to which the sequence converges. If there were such 


5.4.6 Corollary. If [+], and |+*/, are vector norms on a finite-dimen- 
limit x, then 


sional real or complex vector space, and if {x} is a given sequence: 


274 Norms for vectors and matrices 
[xx] = [x —x tx] s [x x] + [xx] 0 


as k, j= œ. This is the motivation for the following. 


5.4.9 Definition. A sequence {x} in a vector space V is a Cauchy 
sequence with respect to the vector norm || if for each e> 0 there is a 
positive integer N(e) such that 


-x| se 


whenever ki, k22 N (e). 


5.4.10 Theorem. Let || be a given norm on a finite-dimensional real or 
complex vector space V, and let {x} be a given sequence of vectors in 
V. The sequence {x} converges to a vector in V if and only if it is a 
Cauchy sequence with respect to the norm |e}. 


Proof: By choosing a basis @ of V and considering the equivalent norm 
I[x]glo, we see that there is no loss of generality if we assume that 
V=R’ or C” for some integer n and if we assume that the norm is || 
If {x} is a Cauchy sequence, then so is each component sequence {x;*)] 
of real or complex numbers for each i = 1, ..., n. Since a Cauchy se- 
quence of real or complex numbers must have a limit, this means that 
for each i=1,..., n there is a scalar x; such that lim, ~» x{ = x; it is 
easy to check that lim, œ x“) =x, where x =[x,...,X,]7 On the other 
hand, if there is an x such that limy „o x) =x, then |x) —x)|< 
|x A) — x] + |x —x 2) and the given sequence is a Cauchy sequence. 0 


It is a fundamental property of the real and complex fields (used in the 
proof of the preceding theorem) that a sequence is a Cauchy sequence if 
and only if it converges to some (real or complex) scalar. This is known 
as the completeness property of the real and complex fields, and we have 
just shown that the completeness property extends to finite-dimensional 
real and complex vector spaces with respect to any vector norm. Unfor- 
tunately, the completeness property need not hold for vector spaces that 
are not finite-dimensional. 


5.4.11 Definition. A vector space V with a norm |e| is said to be com- 
plete with respect to the norm |e| if every sequence that is a Cauchy 
sequence with respect to the norm |e] converges to a point of V. 


Exercise. Consider the vector space C[0,1] with the Lı norm |f|)= 
I |f(t)| at, and consider the sequence of functions {fx} defined by 


soan equivalent and sometimes convenient definition fo 


5.4 Analytic properties of vector norms 


275 
S(t) =0, o<r<i_t 
2 Kk 
k 1 1l 1 
f= F (1-7 +5) it d l 
A(t) =1, tod 
zt Sts! 


Sketch the functions Jk. Show that {fy} 


; is a Cauchy sequ b 
there is no function eC{0.1 : sa y seq ence ut that 
bf. Je C[0,1] for which lim, o fy = f with respect to 


Using the fact that the unit ball of any vector norm or prenorm on R” 


or C” is compact, we can i 
, an Introduce another us 
eful metho i 
new norms from old ones. d of generating 


5.4.12 Definition. Let f(+) bea pre-norm on V=R" or C” 
P's max Re y*x 
I(x) =1 
s called the dual norm of Í. 


te first that the dual norm is a well-defined function on V be- 
se Ke y x Is a continuous function of x for each fixed ye V, and the 


. The function 


set (x: f(x) = 1) is a compact set by (5.4.8). By the Weierstrass theorem 


_ the maximum of Re y*x is attai i 
l ttained at som . — . 
; a scalar such that lel e point x9 € fx: f(x)=1}. If cis 


=1, then by the homogeneity of f we have 


max |y*x|= max max Recy*x 


S(x)=1 Sx)=1 |cl=1 
= max max Re y*(cx 
J(x)=1 lel =1 yx) 


=max max Re y*x= max Re y* 
lel=1 f(x/c) =1 S(xy=1 xx 


is r the dual norm 


f?(9) = max |y*x| 

I(x) =1 
; Finally, we must observe that the name dual norm for the func- 
ion f“ is well deserved. The function f?(«) is evidently homogeneous 


and it is positive, for if y <0 i 
nd Y Z U, we can use the homogeneity of J(e) to show 


(5.4.12a) 


Foy |= BE =o 


f?(y)= max |y*x|> 
S) (y 


Iix)=1 


t d matrices 5.4 Analytic properties of vector norms 077 
276 Norms for vectors an 


It is perhaps remarkable that even if the function f(+) does not obey the 
triangle inequality, its dual f Ps) always does: 


= *y| < max [|y*x|+|z*x]] 
f Otz) = max [9+2 x| J&œx)=] | 


If we consider the Euclidean norm ||, a given nonzero vector y, and 
an arbitrary vector x, then the Cauchy-Schwarz inequality says that 
|y*x| = 


È Jx = [»lzixlz (5.4.15) 


D D 
< max |y*x|+ max zx =f O+S 2) 
J(x)=] Jo =l 
The dual norm of a pre-norm is therefore always a norm. in 
Thus, any pre-norm generates a norm by the process of construe i : 
the dual norm. The most common instance of this construction 1s Tor 

re-norm that is actually a norm. Ea 
P A simple inequality for the dual norm Is given in the following Iemma. | 
We shall see that it is a natural generalization of the Cauchy-Schw 


inequality. 


with equality when x = y/| y|,. Using the same argument as above for the 


/ and lo norms, we find that (||)? = | ylz, so the Euclidean norm is its 
own dual. 


Exercise. Explain why the inequalities in (5.4.13) are a generalization of 
the Cauchy-Schwarz inequality (5.1.4). 


Notice that for each of the three norms just considered (/,, h, and lo), 
the dual of the dual norm is the original norm. This is no accident; the 
duality theorem (5.5.14) says that this always happens. 


Among these three examples, the only norm that equals its dual is the 


. =C” or R”. Then 
5.4.13 Lemma. Let f(+) be a pre-norm on V=C Euclidean norm. It is not difficult to show that this is also no accident. 


ly*x| f(x) f°) 
yx <f?(x) SO) 
for all x, ye V. 


5.4.16 Theorem. Let | +] be a vector norm on V=R" or C”, let [+l be 
its dual norm, and let c>0 be given. Then |x| =c|x|? for all xe Vif and 
only if |+|=vVele|2. In particular, |-|=J|+|? if and only if [+] is the 


f x #0, th Euclidean norm ||>. 
Proof: lf x #9, then 


Proof: If |*|=Vclej2 and xe V, then 
y* X |< max [y*z|=f?0) y 
Jœ) | pey=! l -0 |x|? = max |x*y|= max |x*y|= max |x*2- 
and hence |y*x| < f(x) f(y). Since this inequality also holds for x | lyf=1 [yl2=1/ve Ih=1| Ve 
= ; : ; *yl= 
we are done. The second inequality follows from the first since ly O max [xty|-= a xi? = a Ikb= 1 ixl 
xy). O Ve fyi =I ve veh? e 


i on vecto l | 
I s casy to Mey he duals of some of e moat eaman weer A gor any xe. Conversely, f=” tor some c> andi x€ V ton 
norms. If x, ye C”, then a special c 


(5.4.13) gives the inequality 


ly*x| = < 3 |y)x|< max |yi| X kl=lylelxl (5-4-14 


leisna 


n 
X Vix; 
i=] 


1 
[x13 = [x] = bef x]? = — bx? 
If y is a given vector, then equality holds in (5.4.14) when x is a uait ver 
(with respect to |+|,) such that x;=1 for some one value of i for w i i 
lyi| = |y] and x, = 0 otherwise. Similarly, if x is a given nonzero vec 
1 [OO 


so |x] = Vc |x|2. We can use this inequality to establish the reverse bound 
f x #0 by considering 


ith respect to felo i \=|y|2 = kyl a J 
ity holds in (5.4.14) when y is a unit vector (wi [x] = |x|? = max lx*y| = max |x*- 
rhen equa a M / lxil tor all i such that x; #0 and, y; =0 otherwise. Thus e pi=1 y0 Iyi 
i= Xi/ Xi 
(yl)? = max |y*x|= max Įyļelxh = [yl -max e 2 [PE a macle 2 | 
ixfp=l khs! ivl yO [ylz | [yl ~ zzo Iyl | ve 
lylo)? = max jy*x|= max [| y|i|x]o= [Vi c l 1 
(yle) an ilo=! y+ — = |x} 
| 


i 27 
We conclude that (Jel)? = [el and (fele)? = leh. Ix]. ve vc 


5.4 Analytic properties of vector norms 279 


aN 6 1 2 20 


278 Norms for vectors and matrices 


where we have used the fact that |y|2/ly|< 1/vc for all y#0; the 
Cauchy-Schwarz inequality guarantees that the maximum absolute value 
of the inner product between a fixed nonzero vector and a Euclidean unit 
vector occurs when the unit vector is parallel to the given vector. Thus, 


|x| <Ve|x|2 for all xeV, which, together with the reverse inequality ; l vn n 
that we have already proved, shows that |x| = ce |x|2 for all xe V. The co ; ; yn 


final assertion follows when c =1 and shows that the Euclidean norm is 
the only norm that equals its dual. E 


. . What i 
As final remark, we observe that there is a useful sense in which a lem | is the table of best lower bounds |x|, = Cm |x|? Hint: See Prob- 


vector, as well as a vector norm, has a dual. 
4. i 
Show that if two norms on a real or complex vector space are equiva- 


b rhen they are eet by two constants and an inequality as in (5.4.5) 
- Consider f(x) =1/|x], on the unit sphere S of | is un- 

) a *ig- If f is un- 
Pounded on S, there is a sequence {xx} CS with |x|, <1 me and Je = 
» whic contradicts equivalence of [ela and |+|,. Notice that this has 
nothing to do with finite dimensionality or compactness. 


5.4.17 Definition. Let xe C” be a given vector and let |e] be a given 
vector norm on C”. The set 

(ye C": jy? ix]=y*x =) 
is said to be the dual of x with respect to |+|. An ordered pair of vectors 
(x,y)eC” xC” is said to be a dual pair with respect to |e] if y is in the 
dual of x with respect to the norm |e]. 

It follows from Corollary (5.5.15) that if |+| is a vector norm, then the 
dual of every vector x € C” with respect to |e] is nonempty. It could con 
sist of one point or many. If |» | =|+[2, for example, then the dual of ever 
vector x €e C” is the one vector x itself. If | +] = Jel on the other hand 
the dual of x =[0, 1]’ consists of a single vector, but the dual of x = [1,1] 
contains infinitely many vectors. See Problem 13. 


aa Show that the functions fg of (5.4.2) have the property that I(x)-0 
me e % We —fj|1 70 as k, j +0, and for each k =2 there is some 
or which | f;—Jfj|o>k'/? for all j>J. Thus, a sequence can be 


convergent in one sense (point-wi . 
. -wise), Cauchy in an 
in another norm. í y orm, and not Cauchy 


sequence pe z complete real or complex vector space, let {x} bea given 
sequen , an | et Il be a given vector norm on V. If there is an 

= 0 such that Lf-1|x| < M for all n = 1,2, ..., show that th 
quence of partial sums { y™) defined by y™ =} "0 conv to 
problems a point of V. What theorem about convergence of infinite seri "of real 
numbers does this generalize? ares of real 


1. Note that (5.4.5) may be stated equivalently as 
7. Show that |x|..=lim,_,..[x|, for every x e C”. 


8. If ~>O and |+|,,=a[+], show that (fela)? = (1/a) fe |?. 


9. Show that the dual norm of the / i 

l 'p norm is the /, norm for an 
where q is defined by the relation 1/p +1/q =1. Hint: Replace 6414) 
with the general form of Hölder’s inequality. E 


a pgo EE < Culla t 
CrCl læ | le) s Ixle <Cm(| læ l le) 


where Cm(°, °) and Cp(°, +) denote the best possible constants relating th 
respective norms in (5.4.5). Show that Cin (Hels, bela) = Cm delo lel 
2. Express Cm(lela |ely) in terms of Cm(le la» lelg) and Cm(lele lel 
where the constants involved need not be best possible. Do likewise f 
Cm. 


10. Let [+], and [elg b i 
g be two given vector norms on C”, and su 
’ ppose the 
is some C>0 such that |x|, <C|x|, for all xeC”. Show that Ios 
|x|? for all xe C”. Hint: e 


Q 


3. Verify that the accompanying table gives the best bounds Cu(lelas lel x)2= m si y*x 

between the J, 2, and lo norms; that is, |xla < Cmlx|g for all x € C” an [xla bial yx] = ore lyla 

for a, 8 =1,2,œ. In each case show that the bound is best possible |y*x| ly*x| A 

exhibiting a nonzero vector x such that |xļla = Cmlxle. = max v = max = * 
y#0 IVa y#0 Clyls C man x| 


280 Norms for vectors and matrices 5.5 Geometric properties of vector norms 51 


11. Show that the group of isometries of |+|? always contains the set of 
adjoints of all the isometries of |+]. Deduce from this that the group of 
isometries of |e]? is exactly the set of adjoints of the group of isometries 
of |+|. When will the isometries of |+| and |e]? be the same? 


16. Let |+| be a vector norm on V =R” or C”. Show that 


kazo PP pesi piei a) (gz hl: 
12. Let |e] be a vector norm on C”, and let Te Mp. Show that |-|?= 
|-|? if T is an isometry with respect to ||. 


13. Let |-] be a given vector norm on C”. (a) Show that the dual of 0 with and that D > 
respect to |e] is always {0}. (b) Use Corollary (5.5.15) to show that the min 2S | min |x|] =C 
dual of every xe C” is nonempty. Hint: If | y|? =1 and yx = |x|, de- x20 |X| jxf=1 7) 


termine c= 0 for which y = cy is in the dual of x. (c) Let |+| be the Eu- 
clidean norm |» |z. Show that the dual of every x e C” is {x}. (d) Let [>| = 


|-|... Show that the dual of x = [o] is {x}; the dual of x = [ i] is the two 
line segments from [5] to [3] to [o]. (e) Let |e] = |e]. Show that the 


Deduce that C,,|x| < |x|? < Cul 
) < £ Cy |x| for all xe V, so geometri 
give bounds between every norm and its dual. ° nical constants 


17. Let (+) be a pre-norm on R” or C”. Show that 
fP(w)= max Re y*x = max |y*x| 


dual of x = [3] is (| l i and the dual of x = [2] is the two line segments | fos! Jo) sI 
- . . Re y*x | y*x| 
from -| ~!] to [°] to —| | |. (£) Show that y is in the dual of x with re- - = max = IZ X 
| i] [7] [i] (£) y l veo JO R (5.4.18) 


spect to |» | if and only if x is in the dual of y with respect to | |P. (g) Show ` 
that the dual of x with respect to |+| is {x} for every x e C” if and only if | 
J+] = Iolo. , 
14. Consider the function f: R? > R given by f(x) =|x;x2 |2, Show that ` 
the set {x: f(x) =1} is not compact. Does this contradict (5.4.8)? 


See the exercise after (5.6.1) for another instance of this idea. 


Further Readings. See [Hou 64] for a discussion of dual norms. The id 

that the dual of a pre-norm is a norm seems to be due to J. Von Neuma a 
who discussed “gauge functions” (what we call vector norms) in “Some 
Matrix-Inequalities and Metrization of Matric-Space,” Tomsk Univ, 
Rev. 1 (1937), 205-218. A more readily available source for this paper 


may be vol. 4 of Von Neumann’s Collected Work 
Macmillian, New York, 1962. orks, ed. A. H. Taub, 


15. Consider the example of a pre-norm f(x) = (|x|«lx lg)" on R? given - 
in the text, where 


[xla=[U0x1,%2)7 |e ele = lxi 10x21" l 


Show that the portion of the “unit ball” {xeR": f(x) <1) in the first f 
quadrant is bounded by segments of the lines x, = 1/ V10 and x, =1/ viO © 
and an arc of the hyperbola x, x, = jg. Sketch this set and show that it - 
is not convex. Why is the rest of the “unit ball” in the three remainin 
quadrants obtained by successive reflections of this set across the axes 
Show that the unit ball of the dual norm {x eR?: f?(x) <1} is bound 
ed in the first quadrant by segments of the lines x, /10 + x2 = V10 an 
xı +x2/10 = V10, that the whole unit ball of f? is obtained by succes 
sive reflections of the portion in the first quadrant, and that it is convex 
Show that the portion of the unit ball of fP? in the first quadrant 
bounded by segments of the lines x. =1/V10, xı =1/V10, and x, +x 
11/(10-V10 ), that the rest of the unit ball is obtained by successive refle 
tions of this set across the axes, and that it is convex. Finally, compar 
the unit ball of £22 with that of f and show that the former is exactly th 
closed convex hull of the latter. 


5.5 Geometric properties of vector norms 


me primary geometric feature of a vector norm is its unit ball, through 
which considerable insight about the norm may be gained 


5.5.1 Definition. Let |+| be a vector norm on the real or complex 


vector space V, let x be a point of V, and let r i 
, > ; >0b 
radius r around x is the set e given. The ball of 


By (73x) =fyeV:|y-xl <7} 
The unit ball of |e} is the set 
Bj.; = By.\(150) =(yeV: Jy} st} 


Exercise. Show that for ever 
y r>0 and for every xe V. x)= 
[y +x: ye B(r;0)} =x+B(r;0). ” B= 


kS 


282 Norms for vectors and matrices 


A ball of given radius around any point x looks the same as 7 pall of 
the same radius around zero; it is just translated to the point A ne umit 
ball is a geometric summary of a norm, which, pecause $ aes 
geneity property, characterizes the norm (actually only t rane ary of 
B,.; is needed). Here we determine exactly which subsets o 


unit ball of some vector norm. 


2 
Exercise. Sketch the unit balls for the 4, h, and læ norms K R^. Are 
there any containment relationships? Which points must e on n 
boundary of the unit ball of any /, norm on R“? Sketch the unit ba 


some other /, norms. 


Exercise. If |+|, and |+|g are two vector norms on a vector space Ke show 
that [xla < |x|, for all xe V if and only if Bi-igC B-le The natura, ar 
tial order on vector norms may therefore be expressed in terms ee 
metric containment. What happens to the unit ball when a norm is 


plied by a positive constant? 


Exercise. If |+| is a vector norm on V, if xe V, and if a is a scalar such 
that Jæx|= |x], show that either x=0 or |a|=1. Conclude that ea 


“ray” {ax:a>0} intersects the boundary of the unit ball of Je] exactly ` 


once. 


5.5.2 Definition. A vector norm is called polyhedral if its unit ball isa — 


polyhedron. 


Exercise. Which of the /, norms are polyhedral? 


Exercise. If |+| is a polyhedral norm and if S € M, is nonsingular, is |+|s 
polyhedral? 


The basic topological notions of open and closed sets are very easy to 
define in a vector space that has a norm. 


5.5.3 Definition. Let |-| be a norm on the real or complex vector Space 
V, and let S be a subset of V. A point x e S is said to be an erie po m 
of S if there is some e >0 such that BCS The wt i is pa 
be open if every point of S is an interior point; S is sai to ae a 
its complement is open. A limit point of S is a point XSW nee th 
lim, 0x) =x (with respect to |+]) for some sequence a ICS. Th 
closure of S is the union of S with the set of limit poins lonree 
boundary of S is the intersection of the closure of S wit e ck Se 
the complement of S. The set S is bounded if there exists s 


5.5 Geometric properties of vector norms 283 


such that § C B;.\(M;0). The set S is compact if from every covering 


Ua Sa D S by open sets Sa one can extract finitely many sets Sap sey Say 
such that UF; Sa; D S. 


Exercise. Show that the unit ball B;.; is closed and bounded for any 


vector norm |] on any real or complex vector space V, 

Exercise. Let V bea finite-dimensional real or complex vector space, and 
let SC V be a closed bounded set. Use the fact that V is isomorphic to R” 
or C” for some n (See Appendix E) to show that S is compact. 


5.5.4 Observation. If |e] is a vector norm on a nontrivial (i.e., not 
zero-dimensional) real or complex vector space V, then 0 is an interior 
point of the unit ball B,.;. This follows from the homogeneity and posi- 
tivity of the norm [+], which implies that B).)(4;0)c B).,(1; 0); with the 
boundary of the former being in the interior of the latter. 


5.5.5 Observation. The unit ball of a vector norm is equilibrated; that 
is, if x is in the unit ball, then so is ax for all scalars a such that |a@| =1. 
This follows from the homogeneity property of the vector norm 


5.5.6 Observation. The unit ball of a vector norm on a finite-dimen- 
sional vector space is compact, It is bounded because of the homogeneity 
property of vector norms and it is closed because the norm is always a 
continuous function. In the finite-dimensional case, a closed bounded set 
is compact, but this is not always true in the infinite-dimensional case. 
The property of compact sets that we shall use most frequently is the 
Weierstrass theorem (see Appendix E): A continuous real-valued func- 
tion on a compact set is bounded and achieves both its supremum and 
infimum on the set. For this reason, we usually refer to the “max” or 
“min” of such a function. 


Exercise. Consider the complex vector space J, of vectors x = (x;) with 


countably many components with the natural extension of the finite- 
dimensional /, norm 


kb=( È In?) 


Show that |e, —e,|,=v2 for every pair of distinct unit basis vectors ex 
and ej, k, j/=1,2,.... Thus, no infinite subsequence of {e,} can be a 
Cauchy sequence, so there can be no convergent subsequence. Conclude 
that the unit ball of /, cannot be compact. 


284 Norms for vectors and matrices 
5.5.7. Observation. The unit ball of a vector norm is convex. 


Proof: If |x|<1, |x| <1, and we [0,1], then 
lax+(1—a@)y| <lax|+|U—a)y| =alx|+U-—a)|y|/sat+(—a) sl 
so that ax+(1—«a)y lies in the unit ball also. O 


The foregoing necessary conditions on the unit ball of a norm are also 
sufficient to characterize a norm. 


5.5.8 Theorem. A set B in a finite-dimensional real or complex vector 
space is the unit ball of a vector norm on V if and only if B is a (i) com- 
pact, (ii) convex, (iii) equilibrated set, (iv) with 0 as an interior point. 


Proof: That conditions (i)-(iv) are necessary has already been observed. 
To see that they suffice for the definition of a norm, consider any nonzero 
point xe V. Construct a ray segment {ax:0sa<]} from the origin 
through x and define the “length” of x by the proportional distance along 
this ray from the origin to x with the length of the interval of the ray 
from the origin to the unique point on the boundary of the unit ball serv- 
ing as one unit. More formally, define |x| by 


|x| =0 if x=0 
1 
Ixl=min fp: r>0 and ixeB} if x#0 


This function is well defined, finite, and positive for each nonzero 
vector x because B is compact and has 0 as an interior point. Using the 
equilibration assumption, it is easy to see that || is a homogeneous func- 
tion, so it remains only to check that it satisfies the triangle inequality. If 
x and y are given nonzero vectors, then x/|x| and y/|y| are unit vectors 
that lie on the boundary of B. By convexity, the vector 


|x| x ly] 
lx+ bel d+ Pl 


must also lie in B. Therefore, |z| <1, and one easily computes that this is 
equivalent to Jx+y|<|x/+ ]y]. O 


Z= 


Exercise. Provide the details in the proof of (5.5.8), noting carefully | 


where each of the four hypotheses is used. 


All of the familiar /, vector norms have the property that |x| depends _ 


only on the absolute value of the entries of x. Moreover, each /, norm is 


tonicity. 


5.5.13 Definition. A vector norm 
weakly‘monotone if 


5.5 Geometric properties of vector norms 


an increasing function of the abs 
two properties are not unrelated, 


285 
Olute values of the entries of x. These 


5.5.9 Definition. If y = [x] e F” (R” or C” 


say that |x| s |y] if |x;|<|y, = 
is said to be [xi] <|y;| for all i=1,. 


), we define |x| = [|x;|]. We 
--» 7. A vector norm |e] on F” 


(a) Monotone if |x| <|y| implie , 
(b) Absolute if [x] = ixil for aes lyi for all x, yeF 


5.5.10 Theorem. A vector 
. no . n : 
and only if it is absolute. rm [e] on F” (R” or C”) is monotone if 


Proof: If |+] is monotone, and if xef” 


lx] =y], so |¥| <x] and |x| <|»], and hence [e] is absolute. If Jel is 


absolute, let x = [x;] e F” . 
, ;] e F” bea given , . 
1=k <n, and let we [0,1]. Then vector, let Æ be a given integer with 


, let y= |x|. Then |y| < |x| and 


lix, tery Xu, AX ky Xk ts vey Xp] ] 


=|} l-a) ix. Xk x x T, 
— fs ky ktb sey Xy] +3(1-a)x+ax| 


< i (l — mmn 7 = ~ 
>53 a) [x serey = 
: | 1 Xk—=1s Nha Xk pty e, Xy] [+4 a) |x] +alx| 
rae! a) |x| 5 (1 a) |x| alx] |x| (5 5 11) 
he assumption that the norm i mn 
I Á m 1s ab i i i 
; litt. repeati 1D : solute is used only in the penultimate 


or different compon 
that an absolute norm has the property that ponents, one can show 


Max., On Xn)" | sli, Xy] l 


H yey xe F” and all choices of ax €[0,1], k=1,...,n. Finall if | 
i i) seb on cach k= 1, ao n there are real numbers a, and 6, with xix 
, al Xe = aze ky. Then using the absolute property we have 


|x| = [Laye'*1y,, trey ane ny, = |l vil, 


(5.5.12) 


o Qn | Vn] 17 | 


<|[]yi],..., Vn 17 =y 
so the norm must be monotone. 0O I i=l] 


The ; . 
e inequality (5.5.11) suggests a slightly weaker notion of mono- 


[e| on F” (R” or C”) is said to be 


286 Norms for vectors and matrices 
T T 
Ix, wees Xk-b 0s, X41, eee Xn] | = Ix, wera Xm ly X ks Xk tds ey Xp] | 
for all xe FE” and all k=1,...,n. 
If a vector norm fe] is weakly monotone, and if «œ € [0,1], then 


JEX, ey Kets OX Ket te Xn] | 
= |(1—ox) [xr Xk-1 O Xk+1 Xn] +ax| 
< (1—0) [Xi Xr- 0, Xk+ Xn] [+a] 
<(1—a)|x|+a)x] = |x] 


so a weakly monotone norm satisfies the apparently stronger condition 
(5.5.12). Thus, if a point on the unit sphere of a weakly monotone norm 
is given, and if one of its coordinates is shrunk to zero, the entire line 
segment thus produced must be in the unit ball. A monotone norm is 
obviously weakly monotone, but not conversely, as the following exer- 
cises show. 


Exercise. Show that the parallelogram with vertices at +[2,2]" and 
+[1, —1]7 is the unit ball of a vector norm on R° that is not weakly 


monotone. 


Exercise. Is the function f(x) = |x; —x2|+|x2| a vector norm on R77 Is it 
monotone? Is it weakly monotone? Sketch its unit ball. 

Exercise. Let ||«| be an absolute norm on R2. Show that if x =[x1, x2] is 
a point on the boundary of the unit ball, then so are the points[+, +x] 
(all four possible choices). Illustrate this geometric property with a sketch 
and exhibit a unit ball of a vector norm on R? that is not absolute. What 
happens in R”? 


Exercise. Sketch the polygon in R? with vertices at +[0,1]’, +[1,0]’, 
and +[1,1]7. Explain why it is the unit ball of a vector norm on R” that 
is weakly monotone but not monotone or absolute. 


The convexity of the unit ball of a vector norm is a fact with many 
deep and sometimes startling implications. One of these is the following 
duality theorem, which we state generally in the context of pre-norms. 
The key ideas involved are very natural geometric ones, namely that the 
smallest closed convex set containing a given set S (the closed convex hull 
Co S; see Appendix B) is the intersection of all closed half-spaces (every- 
thing on one side of a hyperplane) containing S, and that if there is a 


point x, which, whenever S lies in a half-space, also lies in the same half- | 


space, then x must belong to the closed convex hull of S. These simple 


5.5 Geometric properties of vector norms 287 


notions lead directly to the important fact that the second dual of a vec- 
tor norm is identical to the original norm. 


5.5.14 Theorem (duality theorem). Let J(*) be a pre-norm on V=R’” 


or C”, let f? denote the dual norm of f and f?? the dual norm of f? 
and let 


B=(xeVif(x)sj,  B"={xeV:fP?P?(x)<1} 
denote the “unit ball” of f and the unit ball of f??, respectively. Then 
BCB’=CoB 


and hence f??(x)< 
B=B" and 25-7 f(x) for all x eV. If f is a vector norm on V, then 


Proof: If x eV is a given vector, then (5.4.13) says that 
lyx s f(x) f(y) 
for any y e V, and hence 


f?(x)= m *yl< D — 
„max ly xs max fS (vy) = f(x) 


Thus, f?P(x)< f(x) for all xeV, ani j i i 
us, f” s , an inequality that is equivalen 
geometric statement BC B”. 4 tto the 
To prove the second inclusion it is convenient to use the characterization 
(5.4.18) of the dual norm and to observe that the set {teV: Ret*v=1]} is 
a general closed half-space that contains the origin. Using the definition 
of the dual norm, let u € B” be a given point and observe that 


u € ft: Re t*v <1 for every v such that f?(v) <1} 


= {t: Re ¢*v <1 for every v such that Rev*w <1 
for every w such that f(w) <1} 


= fz: Re t*u <1 for every v such that Re w*v <1 for all we B} 


This says that u lies in every closed half-space that has the property that it 
contains every point of B; that is, u lies in every closed half-space that 
contains B. Since the intersection of all such closed half-spaces is the 
closed convex hull of B, Co B, we conclude that u e Co B. But the point 
u € B” was arbitrary, so B”C Co B. Since Co B is the intersection of all 
convex sets containing B, we also have Co BC B” and hence B” =Co B. 

If the pre-norm f is actually a norm, then its closed unit ball B is con- 
vex HA = co BoB" Since their unit balls are identical, the norms f 


288 Norms for vectors and matrices 


One application of the duality theorem is the following useful result. 
It is a special case of a finite-dimensional version of an important general 
result from functional analysis known as the Hahn-Banach theorem. 


5.5.15 Corollary. Let ye C” be a given vector and let |e] be a given 
vector norm on C”. There exists a vector yoe C” such that 

(a) |(¥o)*x|-< |x| for all xeC"; and 

(b) (¥o)*¥=|yI. 
The vector yo is not necessarily unique, but | vol? =land (yoy =|]. 


Proof: We know that 
lyl=(y|?)”= max |y*z| 
jz? =1 


by the duality theorem, and we know by compactness of the unit sphere 
of the vector norm |-|”? that the maximum is actually achieved for some 
[not necessarily unique] vector z = yo such that | yo]? =1, so |y] =|»*yol. 
By multiplying yo by a suitable factor of modulus 1 it is clear that the 
inner product y*yo can be made positive and (b) is established. In gen- 
eral, we know from (5.4.13) that 


(vo)*x| s [yol Ix] = [x] for all, xec” 


The vector yo therefore satisfies (a) as well. Notice that (a) says that 
[yo]? <1 and (b) makes |yo|?=1. 0 


Problems 

1. Show that a set S is closed if and only if it contains all its limit points. 
2. Show that every point of S is a limit point of S, so that the closure of 
S is just the set of limit points of S. 


3. Give an example of a set that is both open and closed. Give an 
example of a set that is neither open nor closed. 

4. Let S be a compact set in a real or complex vector space V with norm 
[e]. Show that S is closed and bounded. If {xa} CS is a given infinite 


sequence, show that there is a countable subsequence {xe} C {Xa} and a 
point xe S such that lim; ~% Xo, =x. Show that any closed subset of a 


compact set is compact. 
5. What happens in (5.5.4) if V is zero-dimensional? 


6. How might the unit ball of a vector seminorm be defined, and how 
does its shape differ from the unit ball of a norm? Sketch an example. 


5.5 Geometric properties of vector norms 289 


7. If [ela and |+|¢ are vector norms on a ve . 
° ctor space and if |» 
vector norm defined by pace and if |e] is the 


[x| = max {xla [xla 
show that By. = By.) N Bilg 


8. Show that a vector norm |] on F” (R” or C”) is absolute if and only if 


a 


[laxi 2X2, 15 An Xn) | = Exi X2, 0.5 Xn] | 


T 
for all [x1,X2,...,x,]° EF” and all scalars Ql}, &2,...,Q,€F such that 
Jay] = + = an| = 1. 


In the next six problems, the following notation is used. Let x,yeV and 
let |e] be a vector norm on the real or complex vector space V. Then 


L(x, y) = {z(t) =X+t(y-x):05ts}} 
denotes the usual (linear algebraic) line segment between x and y, and 
C(x, y; Jel) = (ze V: |x—z]+]z—y] =|x—y]} 


denotes the (metric) convex hull of x and y with respect to the norm |e]. 


9. Show that L(x, y)C C(x, y; lel) for all 
norm fel isl) x, ye V and for any vector 


10. If V=C’" and if the norm is the l2 norm, show that C(x, y; |+[2) = 
L(x, y) for all x, y e C”; that is, show that IX+y]2= |x—zl2+|z—yl) if 
and only if z=x+¢(y—x) for some te [0,1]. 


il. Show that C(x, y;|*]) is always an ordinary convex set; that is, 
show nat if Z1,Z2€ C(x, y; |+|), then ¢z;+(1—1)z2€ C(x, y; J+) for all 


12. Consider V =R? over R and show that C((1,0), (0,1); Jeli) is the 
entire square whose vertices are at the points (0,0), (0,1), (1, 1) and (1,0) 
in the plane. Hint: Show that (0,0) and (1,1) are in the l convex hull of 
(0, 1) and (1,0) and use Problem 11. Show that C((1,0), (0,1); lefa) is 
just the line segment L((1,0), (0,1)), however. ees 


13. Consider V =R? over R again and show that C((1, 1) (1, —1); felo) 
is the entire square whose vertices are at the points (0, 0). (1 1) 2 0) 

and (1, —1) in the plane. Hint: Show that (0,0) and (2, 0) are in the l ' 
convex hull of (1,1) and (1, ~—1). Show that C((1,1), (1 —1); Jefi) is just 
the line segment L((1,1), (1, —1)), however. ue 


14. The metric convex hull of a set of k poi 
points SC V,k>=2, may be 
defined to be the set of all ze V such that z is in the metric convex hull of 


290 Norms for vectors and matrices 


two points, each of which is in the metric convex hull of some pair of 
points of S. Show that this agrees with the above definition when k =2 
and describe the /, convex hull of the set of unit orthonormal basis vec- 
tors fei, €z, ---, €n} in R”. What is the /, convex hull of this set? What is 
the ordinary linear algebraic convex hull of this set? 


Further Readings. See [Hou 64] for more discussion of geometrical as- 
pects of vector norms. The key idea for the proof of the duality theorem 
(the identification of the unit ball of the second dual of a norm or pre- 
norm with the intersection of all the half-spaces containing the unit ball 
of the norm or pre-norm) is used by Von Neumann in the paper cited at 
the end of Section (5.4). See [Val] for a detailed discussion of convex 
sets, convex hulls, half-spaces, and so forth. 


5.6 Matrix norms 


Since M, is itself a vector space of dimension n?, one can measure the 
“size” of a matrix by using any vector norm on C”’. However, M, is not 
just a high-dimensional vector space; it has a natural multiplication oper- 
ation, and it is often useful in making estimates to relate the “size” of AB 
to the “sizes” of A and B. 

We call a function ||e||: M, >R a matrix norm if for all A, Be M, it 
satisfies the following five axioms: 


G) ||Al|/2=o Nonnegative 
(la) ||A|j =0 if and only if A=0 Positive 
(2) |\cA|] =|e| [Al] for all complex 
scalars c Homogeneous 
3) ||A+B] s|All+]4] Triangle inequality 
(4) TABI s |All Bl Submultiplicative 


Notice that properties (1)-(3) are identical to the axioms for a vector 
norm (5.1.1). A vector norm on matrices, that is, a function that satisfies 
(1)-(3) and not necessarily (4), is often called a generalized matrix norm. 
The notions of a matrix seminorm and a generalized matrix seminorm 
may also be defined via omission of axiom (1a). 

Since \\ A? || = ||AA]| < |A| |All = || All? for any matrix norm, it must 
be that ||Al|=1 for any nonzero matrix A for which A* =A. In par- 
ticular, {|/ || =1 for any matrix norm. If A is invertible, then 7 =AA™', so 
7 = |AA7"]] < |All | A7~'||, and we have the lower bound 
- z| 


IN aay 


for any matrix norm |ļe|]. 


5.6 Matrix norms 291 


ereise. Show that if [|e || is a matrix norm, then || A* || >JA I< for every 
by ae ; end all A EM. Give an example of a vector norm on matrices 
quality is not true. 

Some of the vector norms introduced in (5.2) are matrix norms wh 
applied to the vector space M, and some are not. The most familiar 
examples are the /, norms for p =1, 2, æ, They are already known to be 
vector norms, so one needs to verify only axiom (4). me 


Example. The 1, norm defined for A eM, by 


is a matrix norm because 


[ABl = D > mbk s$ ain Pry| 
Senet ix Bry =( 2, jaul (> on!) 
= (Ali |B]; 


a g 


Example. The Euclidean norm or /, norm defined for A € M, by 
n 


IA]. = ( 5 ay) 


i j=l 


is a matrix norm because 


n 


|AB| = X 


ij=l 


n 2 


Qi Dy; 
k=l 


=(,3 lout?) 3 ibnil*)= BIB 


= m, jal 


This inequality is just the Cauchy-Schwarz inequality. When applied to 
matrices, this norm is sometimes called the Frobenius norm, the Schur 


norm, or the Hilbert-Schmidt norm. Notice that if A = [a a --- a,]EM, 
is written in terms of its column vectors a;e C”, then ” " 


|A[3 =la l+- + Ja, [3 


Since the /; norm on C” is unitarily invari 
is unitaril . 
fact that y invariant, we have the important 


292 Norms for vectors and matrices 
2 
[UA 3 = [Uai + «+ + [Van ld = la+ > + lanl = (AL 


whenever Ue M, is unitary. Since |B*|,=|Bl2 for all Be M,, this im- 
plies that 


| UAV h = |AV h =|V*A* [2 =[A*]2= Ale 


whenever U, Ve M,, are unitary. Thus, the /) norm on M, is a unitarily 
invariant matrix norm. 


Example. The lo norm defined for Ae M, by 


|A|.= max la;;] 


l<i,jan 
is a norm on the vector space M, but is not a matrix norm. Consider the 
matrix J=[} || € Mz and compute J*=2J, |J)o=1, IJ? o= [27l = 
2|J|..=2. It is not the case that Iles |7|2,, and hence |e]. is not a 
submultiplicative norm. However, if we define 


|All=n|Ajo, AEM, 


then we have 
n 
<n max È lai bx,| 


l<i,jen k=l 


n 
$, ix Dx; 


|ABy =n max | dX 


lsi,jan 


<n max > |Ala[Ble=n|Alon|Ble 


= |A| 14] 


Thus, only a minor modification of the vector norm [e| is required to 
make it a matrix norm. 


i "i tural matrix norm 
Associated with each vector norm |«| on C” isa na 

|je|| that is “induced” by |+| on M,„. The norm [|+|] is constructed from 

|e], and this construction adds to the list of methods for producing one 


norm from another. 


5.6.1 Definition. Let |+| be a vector norm on C”. Define ||» || on M, by 
|| All = max |Ax| 
jx =l 


The “max” in the above definition (rather than “sup”) is justified Sines 
| Ax] is a continuous function of x and the unit ball B,.; is a compact se 


(see Appendix E). 


5.6 Matrix norms 293 


Exercise. Show that the norm (5.6.1) may also be computed in the fol- 
lowing equivalent ways: 


||A |] = max |Ax 
[xj=1 


= max |Ax 
Ix[<1 


Ax . 
= max T” where ||, is any vector norm 


5.6.2 Theorem. The function ||»|| defined in (5.6.1) is a matrix norm 
on Mn, |Ax| =||Al]|x] for all A € M, and all xe C”, and || 7 |] =1. 


Proof: Axiom (1) at the beginning of this section follows from the fact 
that || Aj is the maximum of a nonnegative valued function, and (la) fol- 
lows from the fact that Ax =0 for all x precisely when A =0. Axiom (2) 
follows from the calculation 


IIcAl] = max |cAx| =max|c| |Ax| =|c| max|Ax| = |c] Alf 
Similarly, the triangle inequality (3) is inherited, since 
lA + Bl] =max|(A + B)x| =max|Ax+ Bx] < max(ļ]Ax| + |Bx[) 
= max]Ax|+max|Bx| = ||Ajj+ |B] 


The submultiplicative axiom (4) follows from the fact that 


_ may ABEL LAB [Bx 
4B] ema ie bel 
lay] Bxl 


=max ~—~ max = [|A| IB] 
bp ™ pq TIAM 
where we assume, without loss of generality, that the maximum is 
taken over only those x that are not in the null space of B. For the 
next assertion, we observe that if x0, then | Ax/|x|| < || Al] because 
of the definition of this norm as a maximum. By homogeneity of the 


vector norm we obtain |Ax| < |A|] |x], which also holds when x=0. 
Finally, 


IZ] = max | 7x] = max |x| = = 
jx] =] ix] =1 


294 Norms for vectors and matrices 


5.6.3 Definition. We say that the matrix norm ||e|| defined in (5.6.1) is 
the matrix norm induced by the vector norm |]. It is sometimes called 
the operator norm or lub (least upper bound) norm associated with the 
vector norm ||. 

Notice that the operator norm is a matrix norm as a consequence of 
general properties of all vector norms. Therefore, one way to prove that 
a certain function on M, is a matrix norm is to show that it is induced by 
some vector norm. We shall adopt this strategy when we discuss an 
important matrix norm called the spectral norm. 

The inequality in the statement of Theorem (5.6.2) says that the vector 
norm || is compatible with the induced matrix norm [+l], and this 
theorem shows that associated with any vector norm on C” there is a 
compatible matrix norm on M,,. The theorem also gives the necessary 
condition ||/|| =1 for a matrix norm ||e|| to be induced by some vector 
norm; unfortunately, this necessary condition is not also sufficient. 

We next note several important examples of matrix norms that are 
induced by familiar /, norms but can also be calculated independent of 
the definition (5.6.1). In each case, we take A = [a;;] € Mz. 


5.6.4 The maximum column sum matrix norm ||»||ı is defined on M, by 


JAlli= max È |a] 

lsjan i=l 
The norm |[-[; is induced by the /; vector norm and hence must be a 
matrix norm. One can show this as follows. Write A € M, in terms of its 
columns as A =[a, ---a,]. Then |] Al}, =max;<;<,|a|). If x =[x;], then 


|Ax|i= frat e xah SL. bia = x [xi] lars 


i=l i= 


< $ |nl( max Jah) = X lAN 
= xh [All 

Thus, maxx =1ļ}4xh £ |A]. If we now choose x=ex (the kth unit 

basis vector), then for any k =1,2,..., n we have 


max Ax = [1 akhi = lakh 


and hence 


max |Ax|,;2= max jag|;= All 
jx =l isken 


5.6 Matrix norms 295 


Since we have now proved that the matrix norm induced by the /, vector 
norm is both an upper bound and a lower bound on |[A|l,, we are done 


Exercise. Prove directly from the definition that |e |}; is a matrix norm. 


5.6.5 The maximum row sum matrix norm |\+|]. is defined on M, by 
n 
lAl- = max $ la;;l 
lsisa j=l 
The norm lell- is induced by the Z» vector norm and hence must be a 
matrix norm. The argument is similar to the proof for the maximum 
column sum norm. We compute 


n n 
= max AX s m 5 
leien PA ij Xj oax 2% lai; |x}. 


Ht 
È ajx 
ji 


= [Alf 0 lx 
and hence max jx},,=1|AX|2 < [|All]... If A=0 there is nothing to prove, 


so we may assume that A #0 Suppose the kth row of Al n 
de Is nonz 
fine the vector z = [z;] E€ cC” b ero and 


|Ax|.= max 
lsisn 


aj, 
Zi >= if a,y;4%0 
© [ail k 
Then [Z]o=1, ag j2j = |ax;| for all J=1,2,...,n, and 
max |Ax|.=|Az|.= S az, 5 -5 
max | | | z| ax PAJK = È Wi = 2 |ax;| 
Thus, 


max JAxle= max z lax;| =[JAlla 


and we are done. 


Fvercise, Verify directly from the definition that [|e || is a matrix norm 
on fie 


5.6.6 The spectral norm |\«||, is defined on M, by 
|Al]2=max{Vd:) is an eigenvalue of A*A} 


Notice that if A*Ax=)x and x0, then x*A*A 2 
® . 3 A= Ax! = 2 
20 and 'v) is real and nonnegative. |Ax|2=dfx]2, so 


296 Norms for vectors and matrices 


Exercise. If B is a normal matrix and B=U*AU with U unitary and 
A=diag(\},...,4,,), show that 


|x*Bx| <max{|\|:) is an eigenvalue of B}|x|3 


Exercise. Show that |Ax|3=x*A*Ax for all xe C” and use the previous 
exercise to show that |le||2 is the matrix norm induced by the Euclidean 
vector norm |+|2. Conclude from this that the spectral norm is in fact a 
matrix norm. 

Exercise. Show that ||UAV||,= |All, for any Ae M, and any unitary 
matrices U,VeM,. Thus, the spectral norm is a unitarily invariant 
matrix norm. 


We next show that one matrix norm may be transformed into another 
by a fixed similarity. 


5.6.7 Theorem. If |||] is a matrix norm on M, and if Se M, is non- 
singular, then 


|Alls=||S~'AS]| for all AeM, 


is a matrix norm. 


Proof: The axioms (1), (1a), (2), and (3) are verified in a straightfor- 
ward manner for |[e||s. The submultiplicativity of ||+||5 follows from the 
calculation 


| AB||s=|]S~'ABS|| = |\(S~'AS)(S~'BS)|| < ]S~'AS]| |S~'BS]| 
=|Alls|lBlls g 
Theorem (5.6.7) can be of great use in tailoring a matrix norm for a 


specific purpose. Some applications of this type are developed here and 


in the following section. 
One important area of application of matrix norms is in giving bounds 


for the spectrum of a matrix. 


5.6.8 Definition. The spectral radius p(A) of a matrix A € M, is 
p(A) = max{|A|: is an eigenvalue of A} 


Observe that if \ is any eigenvalue of A, then IA] < p(A); moreover, there 
is at least one eigenvalue \ for which |\| = p(A). If Ax = \x, x <0, and if 
[A| = (A), consider the matrix XeM,, all the columns of which are 
equal to the eigenvector x, and observe that AX=)X. If [ell is any 
matrix norm, 


5.6 Matrix norms 297 
[| [Tl = PAX |] = LAX |] < Ly xy 


and therefore |A| = p(A) < [JA]. This is a proof of the following theorem. 


5.6.9 Theorem. If ||» || is any matrix norm and if Ae M,, then p(A) < 


A]. 


Exercise. Give an example of a vector norm [e| on matrices and a matrix 
AEM, such that | A] <p(A). 


Exercise. Let ||+|| be a matrix norm on Mn, and consider the mapping 
F:C" M,, defined by F(x) = [x x... x] =the matrix in Mp, all of whose 
columns are just x. Show that. the function [e] defined on C” by [x] = 
|F(x)|| is a norm on C” and that |Ax| < ||A]] [x] for all xe C” and all 
AeéM,, This inequality says that the vector norm |e| is compatible with 
the matrix norm |je||, and this exercise shows that any matrix norm on 
M, has a compatible vector norm on C”. 


Although the spectral radius function is not itself a matrix or vector 
norm on M, (see Problem 19), for each fixed Ae M,, it is the greatest 
lower bound for the values of all matrix norms of A, 


5.6.10 Lemma. Let A € M, and e >0 be given. There is a matrix norm 
l- I| such that p(A) < JA] <p(A) +e. 


Proof: By the Schur triangularization theorem (2.3.1), there is a unitary 
matrix U and an upper triangular matrix A such that A = U*AU. Set 
D, = diag(t, t*, £?,..., £") and compute 


M tdp th; "tidi, 
0 A2 tld ttd, 

D,A D! = 0 0 As ro" d,, 
0 0 O .. tld, 
0 0 0 0», 


Thus, for ¢ > 0 large enough, we can be certain that the sum of all the ab- 
solute values of the off-diagonal entries of D, AD,‘ is less than e. In par- 
ticular, we can be sure that |D, AD] <p(A)+e for large enough ¢. 
Thus, if we define the matrix norm [el by 


|B| =| D,U*BUD7' |, = [ (UD 'B(UD "| 


298 Norms for vectors and matrices 


for any Be Mn, and if we choose ¢ large enough, then we will have con- 
structed a matrix norm such that ||A|| < p(A)+e. Since |A|] = p(A) for 
any matrix norm, we are done. 0O 


Exercise. Explain why the preceding results show that p(A)= 
inff] A|: jef is a matrix norm}. 


We are interested in characterizing matrices A such that A% +0 
as k-+oo, The following result is the last tool we need to attack this 


problem. 


5.6.11 Lemma. Let A e M, be a given matrix. If there is a matrix norm 
i|-|| such that ||A]| <1, then lim, .,.. A‘ =0; that is, all the entries of A‘ 
tend to zero as k > œ, 


Proof: If |A|] <1, then ||A*|| < ||A||‘ —>0 as k > œ. This says that A‘ +0 
with respect to the norm ||» ||, but since all vector norms on the n? dimen- 
sional space M, are equivalent, it must also be the case that A* +0 with 
respect to the vector norm [efe O 


Exercise. Give an example of a matrix A and two matrix norms ||» || and 
le || such that |]A[], <1 and ||Al|,>1. Conclusion? 


Matrices A € M, such that lim, _,,, A*=0 are called convergent and 
are important in many applications, for example, in the analysis of iter- 
ative processes. In is therefore important to be able to characterize con- 
vergent matrices. 


5.6.12 Theorem. Let Ae M,. Then lim,_.,.A“ =0 if and only if 
p{(A) <1. 


Proof: If A* +0 and if x0 is a vector such that Ax =x, then 
A‘x =x 0 only if |\| <1. Since this inequality must hold for every 
eigenvalue of A, we conclude that p(A) <1. Conversely, if (A) <1, then 
by Lemma (5.6.10) there is some matrix norm |+|| such that JA] <1. 
Thus, A‘ +0 as k => œ by Lemma (5.6.11). O 


Exercise. Consider the matrix A = Ki 172] €e Mz. Compute A“ and p(A*) 


explicitly for k =2, 3, .... Show that p(A*)=[p(A)]*. How do the fol- 
lowing behave as k > œ? The entries of A*; A*i; HAI; 14l. 


5.6 Matrix norms 299 


. _f 24 
Exercise. Let A= [i 13 i n] , and define a sequence of vectors {x} e C? 


by the recursion x*+D= Ax, k=0, 1,.... Show that, regardless of the 
initial vector x) chosen, x“ +0 as k = œ. 


Sometimes one needs bounds on the size Of the entries of A‘ as k > œ 
One useful bound is an immediate consequence of the previous theorem. 


5.6.13 Corollary. Let A € M, be a given matrix, and let e > 0 be given. 
There is a constant C = C(A, e) such that 


|(A")ij| = C(p(A) +) 
for all k=1,2,3,... and all i, J/=1,2,3,..., 27. 


Proof: Since the matrix A =[p(A)+ e]~'Ahas spectral radius strictly less 
than 1, it is convergent and hence A‘ +0 as k >œ. In particular, the 
elements of the sequence {A* J are bounded, so there is some finite C >0 
such that |(A“);;|<C for all k=1,2,3,... and all i, j=1,2,...,n. This is 
the asserted bound. oe 


. 1 
Exercise. Let A = [ H ah compute A‘ explicitly, and show that one may 
not always take e =0 in (5.6.13). 


Even though it is not accurate to say that individual entries of 4‘ 
behave like p(A) as k >œ, the sequence { || A‘ |} does have this asymp- 
totic behavior for any matrix norm | 


* 


5.6.14 Corollary. Let |je]| be a matrix norm on M,,. Then 
p(A)= lim A‘ 


k -=> œ 


for all Ae M,,. 


Proof: Since p(A)* = p(A*) <||A* I], we have that p(A) < ||A*||"" for all 
k=1,2,.... If €>0 is given, the matrix A=[p(A)+e]~'A has spectral 
radius strictly less than 1 and hence it is convergent. Thus, || A“|| +0 as 
k => œ and there is some N= N(e, A) such that |A*|| <1 for all k= N 
This is just the statement that || A*|| < [p(A)+e]* for all kK=>N or that 
JA I” = p(A) +e for all k= N. Since p(A) < || A* I|'* for all k and since 
> ° is arbitrary, we conclude that lim, ....|[A*||'" exists and equals 
p(A). 


300 Norms for vectors and matrices 


Questions about the convergence of infinite sequences or series of 
matrices can be treated with vector norms just as one treats infinite se- 
quences or series of vectors. 


Exercise. Let [A] CM, be a given infinite sequence of matrices. Show 
that the series Xo A, converges to some matrix in M, if there is a 
vector norm || on M, such that the numerical series Yo |Ax| is con- 
vergent (or even if its partial sums are bounded). Hint: Show that the 
partial sums form a Cauchy sequence. 


One special case for matrices that does not arise in the study of infinite 
series of vectors is the case of power series of matrices. But because of 
the submultiplicative property of matrix norms, it is easy to give a simple 
sufficient condition for convergence of matrix power series. 


5.6.15 Theorem. If A € M,, then the series 5?°.9 a, A“ converges if there 
is a matrix norm ||» || on M, such that the numerical series DZ- 0 [ax |All’ 
converges, or even if the partial sums of this series are bounded. 


Exercise. Prove (5.6.15). 


Exercise. Show by example that it is possible that the series £g =o a, A* 
converges and the series g-o |a| || Al]* diverges. This is analogous to 
conditional convergence (convergence but not absolute convergence) for 
numerical series. 


Exercise. Let the function f(z) be defined by the power series f(z) = 
Ye-0 az", which has radius of convergence R >0, and let |||] be a 
matrix norm on M,,. Show that f(A) = Ef =0 a,A* is well defined for all 
AeM, such that ||A||<R. More generally, show that f(A) is well 
defined for all Ae M, such that p(A)<R. 


Exercise. If A is diagonalizable and A = §~'AS, one sometimes defines 
S(A)=S'f(A)S, where f(A) = diag( fA), SA2), -3 fn). Show 
that this definition of f(A) agrees with the power series definition in the 
preceding exercise if A is diagonalizable. Is one of the two definitions 


more general than the other? 
Exercise. Show that the matrix exponential given by the power series 
ef= Ss i At 
k20 K! 


is well defined for every Ae Ma. 


eenen hen adaa RSS 


5.6 Matrix norms 
301 


Exercise. How would you define cos(A)? For what A is this defined? 


5.6.16 Corollary. A matrix A a . 
° € M, is invertible if there is a i 
[+] such that |7—Al] <1. If this condition is satisfied matrix norm 


A= 5 (I-A) 
k=0 


Proof: If |I— A|| <1, then the series 


© 


È (7-4) 
k=0 
converges to some matrix C beca 


, : use the radius of c 
series $z“ is 1. But since onvergence of the 


N 
A -A) =[J- > 
2," A)“ =[] (—A)] È UA = 1-1 Ay! 7 


as N => œ, we conclude that C=A"!, g 


Exercise. i i 
reise. Show that the preceding result is equivalent to the following 


statement: If Hel j : . 
stat [+l is a matrix norm, and if Al] <1, then 7—A is invertible 


(I-A)'= Ș 4k 
k=0 


ran Let ||- I be a matrix norm on Mn, and suppose a given matrix 
BA n as an “approximate inverse” BeM,, with the property that 
~I|| <1. Show that A and B are both invertible. a 


Exerci , 
xercise. If the matrix norm |je || has the property that 7 || =1 (which 


would be the case if it were an ind . 
|A] <1, show that uced norm), and if A € M, is such that 


1 
+ [ap SA Ns 
Hint: Use the inequality I-A) ve k 
bound. Use th $ 2«=0 All 
equality fos th E Jove bo mequality |Bo! | =1/ |B] 


to get the upper 
and the triangle in- 


Exercise. If || +|] is a 


general matri z ; 
this case, shov tha mx norm, all we know is that [7] =1. In 


302 Norms for vectors and matrices 
IZ pe ay < W-ap 
i+ tay 10-8 S aa 


whenever |A] <1. 


Exercise. If A, Be M,, if A is invertible, and if A+B is singular, show 
that ||B]| =1/|/A~||| for any matrix norm |j-|]. Thus, there is an intrinsic 
limit to how well a nonsingular matrix can be approximated by a singular 
one. Hint: A+B =A(I[+A7'B). If |A7'B]| <1, then J+ A~'B would be 
invertible, so it must be that ||A~'B|| =1. 


One useful and easily computed criterion for invertibility follows easily 
from the last corollary. 


5.6.17 Corollary. Let A = [a;;] E€ Mn, and suppose that 


n 

laul > È |a,| forall i=1,2,...,7 
4 
Jri 


Then A is invertible. 


Proof: The hypothesis ensures that all main diagonal entries a@;; are non- 
zero. Set D = diag (a, ---, yn), SO that D is an invertible diagonal matrix, 
DA has all 1’s on the main diagonal, the matrix B = [b;;]=1 -D'A 
has all 0’s on the main diagonal, and b;; = —a;;/ajj if i# j. Consider the 
maximum row sum norm || ||.. The hypothesis guarantees that ||B||< <1, 
so I—B = D'A is invertible by (5.6.16), and hence A is invertible. O 


A matrix that satisfies the hypothesis of (5.6.17) is said to be strictly 
diagonally dominant. This sufficient condition for invertibility is known as 
the Levy-Desplanques theorem, and it can be improved somewhat. See 
Sections (6.1), (6.2), and (6.4). 

We now consider in more detail the induced matrix norms defined in 
(5.6.1). These are some of the most familiar matrix norms, and they have 
an important minimality property. Because one often wishes to establish 
that a given matrix A is convergent by using the test ||A]| <1, it is natural 
to prefer matrix norms that are uniformly as small as possible. As we shall 
show, the entire class of induced matrix norms has this desirable property, 
and this property characterizes the class of induced matrix norms. 

Any two norms on a finite-dimensional space are equivalent, and so for 
each two matrix norms |fe||« and |je||g there is a least finite positive con- 
stant Cyy(a, B) such that |{A|].< Cra, B) ||Allg for all A € Mn. This con- 
stant can be computed as 


5.6 Matrix norms 
303 


Cula, 6) =max lla 
azo |All, 


lea nite posi E are Cath 2) there that 4A a similarly defined 
M(B, œ) such that JA] = CuB, a 

for all as Mm in general, there is no obvious redo Ole 

in Probon ys cule D and Cu(B, a), but if we examine the table 

corner fect e en of this section, we see that its upper left 3x3 

coner is ymmetric; that is, Cu(a, 8B) = Cy(B, œ) for any pair of the 

atrix norms |e, jelz, and Ifeli. All three of these matrix 


norms are induced norms, and thi i 
norms. , 1S symmetry Is a property of all induced 


5.6.18 
Theorem. Let |e], and [+] be two given vector norms on C” 


and let |j-l[, and |jel]g de ve i 
M,, that is. lels denote the respective induced matrix norms on 


lAla = max ak and JA = max 14*le 
Define ™ i veo ils 
_ [xla 
Rag = max and Rg,= x Pile 
Then zo Pele pee ey Lhe (5.6.19) 
Alla _ 
a20 [Alp 7? 88 (5.6.20) 
In particular, 
Alla _ A] 
Me) WAL eX pap, = Ree Ree (5.6.21) 


Proof: Let A eM and n . 
Ax #0. Then " xeC" be given, and suppose that x +0 and 


[Axla _ lAxla JAxle bls <p , Axa 


[ele VAxls Delp bela =? Teje Re 
an inequality that holds even if Ax =0. Thus, 
Jaj. = |Axla _ |Ax| 
|Ala = max [xja = Ree max ijy Ror = Rae Re Alle 
and hence 


ve nn es teenie SLL URRY 


304 Norms for vectors and matrices 


heater s» 


for all nonzero A e M,a. 
Each of the two extrema in (5.6.19) is achieved for some nonzero 


vector, so there are vectors y,zeC” such that [yh =[zļh=1, |y|,= 
RaglY lg, and [zle =Rgalzla. By Corollary (5.5.15) there exists a vector 
Zo € C” such that 
(a) |zõx| < |x|g for all xe C”; and 
(b) zoz=|z|p. 
Consider the matrix Ap = yzo. Using (b), we have 
[Aozla _ [Xzőzla _ Wlalzoz| _ lxlalzlo 
[zla [zla [zla [zla 
so we have the lower bound 
Vlalzle 
[zla 
On the other hand, we can use (a) to obtain 
JAoxls _ lzzšxls _ l»lslzšxl < lvlel*le -y 
xls Axle Ixl © Ile ° 
and hence we have the upper bound 


Aolle = lls 
Combining these two bounds, we have 


lAo la > Rag Rga ly] 

lAole Ibile 
which shows that equality is possible in (5.6.22) and establishes (5.6.20). 
The assertion (5.6.21) follows because the right-hand side of the identity 
(5.6.20) is symmetric in œ and 6. O 

Is it possible that two different vector norms on C” could induce the 

same matrix norm on M,? According to the following consequence of 
(5.6.18), this can happen if and only if one of the vector norms is a con- 
stant scalar multiple of the other. 


lAolla= = RapReal Y |B 


Ê = Rug Rpa 


5.6.23 Corollary. Let |+|, and |Ħ|g be vector norms on C”, and let fella 
and |e]; denote the respective induced matrix norms on M,. Then 
lAa =||All, for all Ae M, if and only if there is a positive constant c 
such that |x}, =c|x|g for all x e C”. 


5.6 Matrix norms 305 
Proof: Observe that 


—] -1 
Roa = max Ile _ [min Iho . [rax [xla] 1 
x40 [Xo x0 [X[¢ x0 xls Rag 
Thus, we have the general inequality 
RapRea = 1 (5.6.24) 


with equality if and only if 


min Ele = pay dla 
x #0 [xls x*0 Ixl 

which can occur if and only if the function lxla/|xlg is constant for all 
x#0. Thus, if |x],=c|x|,, we certainly have Rag Rga=1 and hence 
lAl < [Alls and IA] < Alla for all Ae M, by (5.6.21); in this event, 
|Alja= |All for all AeM,. Conversely, if the two induced matrix 
norms are identical, then Rog Rea = 1 by (5.6.20) and hence equality holds 
in 6-6-29 and the ratio |x|,/|x|g is constant by the preceding argu- 
ment. 


5.6.25 Corollary. Let |+|,, and |+|, be vector norms on C”, and let [jelly 
and |l+||z denote the respective induced matrix norms on Mn. Then 
|All. =|Alls for all A € M, if and only if Alla = |All for all Ae M,. 


Proof: If |A la ||All, for all Ae M,, then Rog Rea <1, which [because 
of (5.6.24)] implies that Rag Rg. = 1. Therefore, |All. Allg and ||All,< 
|Alj. for all Ae M, by (5.6.21). O 


The last corollary says that no induced matrix norm can be uniformly 
dominated by another. What happens if we permit comparisons with 
other (not necessarily induced) matrix norms? 


5.6.26 Theorem. Let |j+|| be a given matrix norm on M,, and let lella 
be a given induced matrix norm on M,,. Then 


(a) There is an induced matrix norm N(+) on M, such that N(A) < 
|All for every Ae M,,; and 

(b) ||Al/<||Al]. for every AeM, if and only if || Al] =Alle for 
every Ae Mp. 


Proof: Define the vector norm |] on C” by 
Ixj=(X], X=[xx...x]eM, (5.6.27) 


306 Norms for vectors and matrices 
and consider the matrix norm N(+) on M, that is induced by |e |. For any 


AéM,,, we have 


| Ax || [Ax Ax ... Ax]]| 


Axl 
N(A)= ecm, Se 
(A= max yey ex. xT 


man daxl 
x0 IXI (5.6.28) 

< max a (because ||-|| is a matrix norm) 

= ||Al 


which establishes (a). To prove (b), suppose that |A|] < |All. for all 
AéM,,. Then by (a) we have 


N(A) < |A|] = lAa 
for all A€ M,. But N(+) and |je||a are both induced norms, so N(A) = 
Al]. by (5.6.25), and hence [Al] = ||A]la forall AeM,. O 


The preceding result is the motivation for the following definition. 


5.6.29 Definition. A matrix norm ||»|| on M, is a minimal matrix norm 
if the only matrix norm N(-) on M, such that N(A) < || A|| for all Ae M, 
is N(+) = |e |]. 

Assertion (b) of Theorem (5.6.26) says that every induced norm on 
M,, is minimal. Assertion (a) implies immediately that every minimal 
norm is induced. Thus, if one wants to use a matrix norm that cannot be 
uniformly improved upon (in terms of small values on all matrices), one 
should use an induced norm, and any norm with this optimality property 


must be an induced norm. 
The vector norm (5.6.27) is a special case of a whole family of vector 


norms that can be constructed from a given matrix norm. Let |||] be a 
given matrix norm on M,, let y e C” be a given nonzero vector, and de- 


fine the function |+|,:C” > R by 
Ixlp=lo*] yec, 

Then |-|, is a vector norm on C” with the property that 
[Ax], = AG») SIA lly = IAT 


for all Ae M,,. If y=[11... 1]7, then (5.6.30) reduces to (5.6.27). If we 
denote by N,(+) the matrix norm on M,, that is induced by |-|,, this in- 


equality says that 


y#0 (5.6.30) 


5.6 Matrix norms 
307 


Ax] 1 
N,(A) = max £l HAL x, 
y reo kd <max xl, =A] for all AeM, (5.6.31) 


This n evidently a generalization of (5.6.26a). 
‘he : EN ax nom Te lis a minimal norm, then (5.6.31) implies 
=N, ral A€M,,. Since the vector d in thi 
ment can be any nonzero vector, w have Neo) een 
, we 
N-(+) for all nonzero y, ze M,. moma dien Bave No()= [fo] = 


5.6.32 Theorem. Let ||»|| be a matrix norm on Man, and let N,(+) be 
¥ 


the induced norm defined b 
equivalent: y (5.6.31) and (5.6.30). The following are 


(a) lel is an induced matrix norm. 
(b) [+] is a minimal matrix norm. 
(© i] =N,(+) for all nonzero yec”. 


rook H assertion that (a) implies (b) is just (5.6.26b). We have just 
o ve that if ||-|| is minimal, then lel =N,(+), so (b) implies ( ‘ If 
c), then |/|| is induced because Nj(+) is induced by definition D 


There i 
N ere is somewhat more to be gleaned from these observations. If 
y(*) = |e] for all nonzero ye C”, then N (°) =N; (°) f , 
y, z € C”. But Corollary (5 y z(*) for all nonzero 
given matrix norm i y (5.6.23) says that the vector norm that induces a 
. s unique up to a sca = 
some positive constant c,,. le factor, so [+], = csele]; for 


Exerci , 
} rercise. If the matrix norm [el on M, is induced by the vector norm | | 
on Cr show that [yz*] =y] la}, [+=] fe], and c= 1712/]a]? 
forall ze". € vector norm |+|? is the dual of the vecto 

s defined in (5.4.12). norm I 


, 


5.6. i 
6.33 Theorem. Let ||+|| be a given matrix norm on M, and let ||, be 
¥ 


the vector norm on C” defined 
are equivalent: ed by (5.6.30). The following two assertions 


(a) For each pair of no 
nzero vectors y, n . .. 
stant cy, such that y,Z€C" there is a positive con- 


Ixl = ozlxle for all xec” 


= ez forall x,y,zeC” with z0 


308 Norms for vectors and matrices 


If ||-|| is an induced matrix norm, then it satisfies the identity (b), and the 
vector norms constructed from it by (5.6.30) satisfy (a). 


Proof: If (a), then 
llxz* i zy" = ele lzy = 0/cy2) bly eye lz le = Lely lz he = Hv" Hl flee" I 
Conversely, if (b), then (a) follows with cy: = ||zy*||/||zz*|]. We have 


already argued that if N,(+)=|[+||, then (a) [and hence (b) also] must 
follow, and this will be the case if ||+|] is an induced norm by (5.6.32). 0 


Exercise. Any positive scalar multiple of an induced norm satisfies the 
identity (5.6.33b). Show that the matrix norms ||; and |+|2 both satisfy 
this identity, but that neither norm is a scalar multiple of an induced norm. 


We saw in (5.6.2) that if ||- || is an induced matrix norm, then |Z || =1. 
This property is unfortunately not sufficient for a matrix norm to be an 
induced norm. It is easy to show that the function 

lA] =max{|/Alfi, Al} 
defines a matrix norm on M,, and that ||/|] =1. But since ||A]|; £ A|| for 
all Ae M, and ||All,<||Al] for A=[j $], [lef] is not a minimal norm 
and hence cannot be an induced norm. 


(5.6.34) 


Exercise. Verify that (5.6.34) defines a matrix norm. More generally, 
show that if iello- fell are given matrix norms on M,, then 


|All =max{ Alfa, --- Allo) 


defines a matrix norm on M,,. 


The induced norms are minimal among all matrix norms, but suppose 
one considers only the important class of unitarily invariant matrix 
norms. These are the matrix norms |le|| such that |A|] = ||UAV]| for all 
AéM, and all unitary matrices U, Ve M,,. It turns out that in this class 
there is only one minimal matrix norm, and that is the spectral norm. 


5.6.35 Corollary. If ||» is a unitarily invariant matrix norm, then 
|All < || Al] for all Ae M,,. The spectral norm ||+||2 is the only matrix 
norm on M,, that is both induced and unitarily invariant. 


Proof: Suppose that |||] is a given unitarily invariant matrix norm. By 
part (a) of Theorem (5.6.26), we know that N(A) < || A|| for all Ae M,, 


eet ne ee se nt LSS Ip SULA EE 


5.6 Matrix norms 
309 


rte NA) is induced by the vector norm Je] defined by (5.6 27), If 
rn AH th ae 
. : .H xe is a gi 
U r anitary matrix U such that Ux = fxfaer. Thus, [x] 2 
therefor =|xh [U eil = |x]2[e,] for all xe C”, The vector norm H is 
says thar a scalar multiple of the Euclidean norm and Corollary (5.6 23) 
norm induces oy oe norm induced by |+|) equals Ile ll2 (the mauris 
|+|] is assumed to be indians ore Il+ll2=N(A) = [JA] for all A€ M,. If 
for all Ae M,. kT minimal and hence |A|], = fA] 


If ||+|| is a matrix norm Man, th i * 
Al" = [4+] on M,, then the function |J«|[* defined by 


is al i i 
he Vien arr Ta a i ‘ton calculation shows that | A> = 
|= [= i forall A €e M b ix 
nO hese | i n» OUt not ever t 
S tis property since |[.A||f = [LA Ilo # ||Al],. A matrix norm. such 


that [[e]*= |+|] is said to b ; 
e self-adjoi . 
norms are self-adjoint, and vin, 4 adjoint. The Frobenius and /, matrix 


I|A*]]3 = p(AA*) = p(4*4) = lAl 


the spectral norm i ioi 
s self-adjoint, too. In f ws 
norms on -_ , act, all unitarily invari 
distingui ne are self-adjoint [see (7.4), Problem 2]. The spectral norm is 
as the only induced matrix norm that is self-adjoint ° 


I ° [his an induced ive if and only if J+] is an induced norm 
MATIX norm |e] is induced b 
le l[* is induced by the dual norm, jeje vector norm |e|, then 


(c) The spectral norm Hell. i 
. *i/2 18 the onl i . 
both induced and self-adjoint. omy matrix norm on M, that is 


P . Ay . 
AcM then NG PENA eel tee ay NØ SIAI = A] for al 
. = = or all Ae M,. If iel i ini 
matrix norm, then N(+)*= lll] and hence N+) = Wey, I is a Minimal 


imal i i aw a 
al matrix norm. The assertion (a) follows from (5.6.32). Noam 


that |||] is induced b th . 
(5.5.14), we have y the vector norm [e]. Using the duality theorem 


lAl*=ļ4*= max |A*x] = max ([A*%x]?)? 
zs ixi =] 


xej 
= max max |(A*x)*z _ 
A — m * 
I=1 [P= 1 ! fe} ia [x Az| 


310 Norms for vectors and matrices 
= max |Az|” 
jz? =l 

and hence ||e||* is induced by |-|?. For the last assertion, we observe 
that if the matrix norm |le|| is induced by the vector norm |e|, and if 
llel] = -||*, then (b) says that lle || is also induced by j-|?. But Corollary 
(5.6.23) says that the vector norm that induces a given matrix norm 
is uniquely determined up to a positive scalar factor, and hence there 
exists some c>0 such that |+|? =c|-|. By (5.4.16) we must then have 
Je] = |+|2/Ve. Since the given vector norm is a multiple of the Euclidean 
vector norm, they both induce the same matrix norm and we conclude 


that |e =l H 
Exercise. Show that ||-||* is a matrix norm whenever e|] is a matrix 


norm. 
Exercise. Give an example to show that a self-adjoint matrix norm need 


not be unitarily invariant. 


Absolute and monotone vector norms were introduced in (5.5), and 
are the most commonly used vector norms. There is a simple and useful 
characterization of the matrix norms that are induced by monotone 


vector norms. 


5.6.37 Theorem. Let |e] be a vector norm on C” and let \|+|| be the 
matrix norm on M,, that it induces. The following are equivalent: 
| for all xe C”. 


(a) || is an absolute norm; that is, | |x| | = |x| 
s|y]. 


(b) [e] is a monotone norm; that is, |x| < |y] whenever |x| 
(c) Whenever D = diag(d;, dz, ---, dn) E€ Mn, then 
[Dl = max |d;| 
lsisn 
Proof: The equivalence of (a) and (b) is the content of (5.5.10). If |e || is 
monotone, and if we set 
d= max |dj|, d=|dx| 
Isigan 
then |Dx|<|dx| and hence |Dx| <da |x] with equality for x =ex. Thus, 


may dl 
del 


and hence (b) implies (c). If we as 
|x| <|y| and note that there are complex numbers dx such that 


=d 


sume (c), let x, y€ C” be given with 
|x| = 


5.6 Matrix norms 
311 


dy yx and |d| s1, k=1 . 
=i, =], e” A. Th = . 
Dy = |x| and ||D]] <1. Since us, if diag(dj,...,d,,), we have 


Ix||=|Dy] s|]DI| [>| sy] 


the norm |e] must be monotone, O 


Problems 
1. Gi 
Give an example of a vector norm for matrices for which |Z] <1 


2. A matrix A such th 2 i 
at A*=Ai is said to be j i 
N ; e idempotent. G 
crample of a 2-by-2 idempotent matrix other than J and 0. Show that 0 
het ae e only possible eigenvalues of an idempotent matrix. Show 
lite empotent matrix A must always be diagonalizabl id 
| =1 for any matrix norm fe if AO © and that 


3. I . i s 
all c a Show “however, th a Mn, show that c||+]| is a matrix norm for 
any c<l. Í » that neither c||e ||; nor cfe]; is a matrix norm for 


4. In Definition (5.6.1) th 
.0. e same vector norm is i i ; 
ways. More generally, we might define lelles es in two different 


lAle s= max [Ax], 
Ix],=l 

where |» . . . 

rere he rE [elg ar e two (possibly different) vector norms. Is such a 
esting pro er a matrix norm? Study ||e||a, g to determine what inter- 
define sno, es It might have; note that this notion might be used to 
norm on cn at matrices, since |+|,, may be taken to be a vector 
properties like th Ils may be taken to be a vector norm on C” What 

ose of an induc i ° 

regard? ed matrix norm does ||- ||« g have in this 


5. Sho 
* Unitary inert Puclidean norm [e2 and the spectral norm [[+|[, 
son ; that is, A and U. 
norm _ » A and UAV have the 
i, whenever U and V are unitary. Compare the matrix norms Jej and 
ny respects as you can. Note that |A|: = [tr 4*4] ? 


6. Verif i . 

6, Verify that axioms (1) for |] imply that the same axioms hol 

norm” in the hypoth a s verifies that (5.6.7) remains valid if “matrix 
esis ion i 

matrices.” and conclusion is replaced by “vector norm on 


7. i , , 
ru t 1 is an induced matrix norm on M, and if S€ M, is nonsingul 
le l[s [as defined in (5.6.7)] is also an induced matrix norm 1 f 


312 Norms for vectors and matrices 

|| +|| is induced by the vector norm |+|, show that the matrix norm || || 5 is 
induced by the vector norm |*|,-1 [as defined in (5.3.2)]. 

8. Show that the nonsingular matrices of M, are dense in Mn; wat is 
show that every matrix in M, is the limit of nonsingular matrices. Are the 
singular matrices dense in M,,, too? 


9. Show that the set of vector norms on C” is convex er aA y hiy 
, i i n=2. . 

the set of matrix norms on M, is not convex for any ae o) and 

N>2(+) are matrix norms on M,, show that N(+*) =5[N; (°) > 


matrix norm if and only if 
[N (A) — N4) [N (B)-N:(B)] < 2[N1(4) Ni (B) — Ni (AB)] 
+2[N2(A)N2(B)—N2(AB)] 


_ fo! 
for all A, Be M,,. Hint: Consider N,(+) = feh, Me) = plz, A= [o 1 sand 


B= A’. See Example (7.4.54) for an important subset of the matrix norms 
that is convex. | 
10. Show that the /; vector norm on M,, Ah = D7; =1|a;;|, is a matrix 
norm that is not an induced norm. 
11. Show that all of the following are equivalent ways to compute the 
spectral norm (5.6.6): 

|All2= max |Ax|,= max |Ax|2 


[x]2=1 EA PES 


= max Ach = max 


|y*Ax| 
x#0 |x|2 Ixl,=|yl.=1 


= max |y*Ax| 
|x]2<1 
iylgs! 


identitie w All, =||A*||, for all Ae Mn. Now 
rove that Arh, aed ll. H JA k e using the facts that ||+||, is a 


matrix norm and A*A is Hermitian. 

12. If p(A)<1, Ae M,, show that the series /+A+A’+--- converges 
to the sum (J—A)7!, 

13. If AEM, is not invertible, show that ||7—A|| =1 for every matrix 
norm [|e ||. 


i i on M,. Show that |A| = 
14. Let |ela and ||] g be given matrix norms Mn. Shi ; 
max{||A I ||A || ,} is a matrix norm on M,. When is it an induced norm? 


5.6 Matrix norms 313 


15. Give an example of a matrix 4 such that p(A)<||All for every 
matrix norm ||. 


16. Let A=[a,;]€M,. Show that the function |j«l] defined on M, by 
| A || =n max, <i,j<n|dj;| is a matrix norm that is not induced when n> 2. 


17. Use the idea in Problem 12 to compute the inverse of the matrix 


1-2 |] 
0 1 3 
0 0 1 


Hint: Only three terms in the series are nonzero. 


18. Explain how to generalize the method in Problem 17 to invert a gen- 
eral nonsingular upper triangular matrix A eM, Hint: Choose a diag- 
onal matrix D such that DA has all 1’s on the main diagonal. 


19. Show that the spectral radius p(+) is a continuous and homogeneous 
function on M,,, but that it is neither a matrix norm nor a vector norm on 
M,, because 

(a) p(A)=Ois possible for some A = 0; 

(b) p(A+B) > p(A)+ p(B) is possible; and 

(c) P(AB) > p(A) p(B) is possible even if p(A) and p(B) are both 

nonzero, 

Hint: Consider lS al, [o ol [o al and lo il. 


20. Show that |AB|; < ||A l2|Bļ|z and |AB|,<|A l2 |B]]2 for all A, Be 
21. Show that |A]? < [Ali Alo for all Ae M,. How does this com. 


pare with the bound you can get directly from the table in Problem 23? 
Why the difference? Hint: p(A*A) < || A*A|]; and |A* hi = Al. 


22. Let |+|,, bea given vector norm on C” and define [eg = (feln)? to be 
its dual norm. Let ||» ||, and l+ |s denote the matrix norms on M, that are 
induced by [e| and [e lg, respectively. Use (5.6.36) to show that l A* llo = 
|All. for all A € M„. Deduce that ||A|[3< Alla |All for all Ae M, and 
explain how this generalizes the result in Problem 21. How is this inequal- 
ity related to (5.4.13) for x = y? 


23. Verify that the entries in the following table give the best constants 
Cm such that ||Aj}.<Cy ||Alls for all Ae M,,. All the norms in the table 


ing the (i, j) entries in the table. The matrix given in each case is one for 
which the inequality ||A la © Cy |All is an equality for the given value 
of the constant Cw. 


314 Norms for vectors and matrices 


lel, led, nl co 
-l ih Ie HE 

LLN Hh 
I ] vi n ! Yoo 
a 1 vin 
ell. vn Va 1 1 vn , 
Il a nal? n ' 
Ih - ! 
i noo mh o o Oo” 

n*o 


The following matrices are all in Mp: 


I is the identity matrix 


ll entries 1 l 0 
A he all 1’s in its first column and all other entries are 
1 


A> has only the (1,1) entry 1 and all other entries are 0 
2 

(1,2) follows from (2, 1) by (5.6.21) 

(1,3) §AlislAhsnllAllo; 4 


â n n n 
a» el Sel = [Sel [S86 


(Cauchy-Schwarz inequality); A, 


n . J 
a;;|<n max |a;,j\; 
(1, 6) Vejen 2! ul l<i,jsn 


(2,1) follows from (2, 5) and (5, 1); Aj 
(2,3) follows from (2, 5) and 6, 3); Aj 
(2,4) follows from (2, 5) and (5, 4); Aj 


* S \,(A*A =trAtA=|Al3; Ai 
(2,5) |A| = 047A) s X NCA ) 


(2,6) follows from (2, 5) and (5, 6); J 
(3,1) follows from (1, 3) by (5.6.21); Ai 
(3,2) follows from (2, 3) by (5.6.21); Aj 
(3,4) Ai 

(3,5) similar to (1, 5); Aj 

(3,6) similar to (1, 6); J 


5.6 Matrix norms 315 


n a 7 
(4,1) E D laj|<n max XY laj; 1 
j=l1i=] lsjsa i=] 


(4,2) follows from (4, 5) and (5, 2); the following matrix gives equal- 


ity in both: take a= e?"/", notice that (ā) =a~* and Elha“ = 
Oif j0, nif j=0; let the (k, j) entry of A be g*/ and check 
that A*A =n], |All =vA, |Al,=n°, and JAh=n 


(4,3) similar to (4,1); T 


n 2 n n 
(4,5) | 5 ui] = © Jajal} S 


j=l ijp q=] ij, pq=l 
(arithmetic-geometric mean inequality); J 


(4,6) 5 la| <n? max layli; J 


i j=] Il<i,j<n 


jJ=liz] j=] Isjan j=] 


(5,1) X Zins S| 3 asl) =| max 5 Jayi] I 


n n 
6,2) 5 lay =t A*A = $ MAA) =n Ana (A*A); I 
i=l 


ij=l 


(5,3) similar to (5,1); I 


n n 1? 
(6,4) S las] 5 lalf; 4 


j=l 


(3,6) Ð la|? <n? max lapl; J 


i j=l l<i,jan 


(6,1) = max |a;;|< max È lapl; 7 


Isjen i=] 


(6,2) max |a;;|?< max S Jay, 


isisn j=l 


2 = max (A*A);;<p(A*A); I 


lsisa 


(6,3) similar to (6, 1); I 


n 
(6,4) max la;ls > lai;|; A, 
l<i,jsn j=] 
(6,5) max |a;;|?< 5 |a;;|7; A, 
Lgi jean i,j=l 
24. Show that the boun 


d (5, 2) in Problem 23 can be improved to jah s 
[rank A]? | Aj}. 


Hint: rank A = number of nonzero eigenvalues of A*A. 


316 Norms for vectors and matrices 


25. Let Ae M, be given. If e >0, we know by Lemma (5.6.10) that there 
is some matrix norm ||e|| such that p(A) < || A|| < p(A) + €. Show that there 
is a nonsingular matrix C =C(e)e M, such that p(A)< CAC] < 
o(A)+e. Hint: Use the same construction as in Lemma (5.6.10) and show 
that ||CAC~'I]} = p(A*A)+ Of) as e 70. 


26. Show that |A|} = S7-1 |\;|? for all A € M, with equality if and only 
if A is normal. The quantity 


n > 1/2 

2 

jia- D lA;| | 
p= 

is sometimes called the defect from normality for this reason. Hint: Use 

the Schur triangularization theorem and the fact that the Frobenius norm 


is unitarily invariant. 


27. Theorem (5.6.9) and the companion matrix can be used to give 
bounds for the zeroes of a polynomial with real or complex coefficients. 
Any polynomial f(z) of degree at least 1 can be written in the form 
f(Z)= Cz*p(z), where C is a nonzero constant, 


D(z) =H Ay ZF Gy 2" Fo HAZ AO (5.6.38) 


and a) #0. The roots of p(z) =0 are the nonzero roots of f(z) =0, and it 
is these roots for which we can give various bounds. (a) Show that the 
characteristic polynomial of the companion matrix 


[ —an-1 Tan- + —-a, ~do 
1 O .. 0 0 
C(p) =| 0 1. 0 0 (5.6.39) 
0 0 1 0 | 


is exactly p(z), and hence the eigenvalues of C(p) are the same as the 
roots of p(z) =0. Hint: Compute det{zJ—C(p)] using cofactors of the 
first column and use induction. (b) Use Theorem (5 .6.9) to show that if Z 
is a root of p(z)=0 and if ||] is any matrix norm on M,, then z| s 
|| C(p)||. In the following, Z represents any root of p(z) =0. (c) Use |+|}1 
to show that 
|Z| < max{|ao|, 1+ |a|, -3 1+ |@n—113 
(5.6.40) 
<1+max{laol, |a|, -> |an- 


This bound on the roots is known as Cauchy’s bound. (d) Use Ije llo to 
show that 


5.6 Matrix norms 


(1, | o] | 1l ne | # TES. | | | 1] t | | 
A=] 


This is known as M. 
ontel’s bound. Sh 
bound. (e) Use [+]; to show that ow that 


|S (n—1)+ Jao] + Ja,| + e + lanl 


which is a 
Poorer bound than (d) for all n> 2. (f) Use fel to sh h 
. ° 2 ow t 
|Z| = [n+ lag]? + jai? -+ 4 Ja, 2712 a! 


which is a poorer bound than Carmi 


@) Use nfl. to show tha, chael and Mason’s bound (5.6.42) 


|Z| =n max{1, laol, Jail, ..., lanl} 
which is a poorer bound than (5.6.41) 
28. Using the same notation as in 


bound in part . Problem 27 . 
part (f). Write the companion matrix as C= jmprove the 
= , where 


0 0 0 0 
1 0.. 0 0 
s=/® 1> 00 
00... 10 
and 
Tani Sapa. =a, =a 
0 0 0 0 


and show that S 


*R=R*S= 
[ao[*+ fay]? 4 R*S=0. Show that 


|@n—1|*. Show that 
ICI = Cioci 


[S*S]2=1 and ĮR*R],= 


2=|(S+R)*(S+R) |] 
aa =| S*S+R*R]] < [S*Sl2+|R*RI 
educe Carmichael and Mason” bound 


Zs 2 2 
| l [1+ ao] +a] +e tla, |? (5.6.42) 


318 Norms for vectors and matrices | 
29. Apply the bound (5.6.41) to the polynomia 


z)=(z—-1) p(®) a. “apenas 
olga, Da" dyad" +--+ +(a9—4) 


and show that f aaa 
|Z| < max{1, |ao| + |@o—ai|+ -+ + |@n—2 an | m A 
hat the second term in this expression is not less than 
Show tha 


another bound of Montel 


— 5.6.43) 
|Z| <|ao| + |ao— ail +> + |an-2—48n-1|+ [an-ı 1| ( 


E} ; If Jz) = 
Kakeya’s theorem: 
> und (5.6.43) to prove ; i al nonnega- 
30. Use Monig o azia is a given polynomial with n ree 
n a ; 24,12 
anz anmi? , that are monotone in the sense t at in Mat is, all 
tive coeficients che roots of f(z) =0 lie in the uni ; 
a, = ao, then a 
z| sl. er bounds on 
|Z| ceding four problems have all concerned uppe be used to 
She The pre alues of the roots of p(z)=0, bu iven by (5.6.38) with 
the absoute bounds as well. Show that if p(z) 1s gi 
obtain lower ‘ 
dy #0, then the function 
0 1 1 n, a-l Mond 4, p et 
"n(—)=zZ +z + a 
q(z)= mo. Pz ag 0 he reciprocals of 
ial of degree n whose zeroes are exactly r e a he roots of 
is a polynomia =0. Use the respective upper boun ts Z of p(z) =0. 
ne ay y Peay the following lower bounds on the roo 


Cauchy: 
T |+ l]a} 
Iz] = max{1, |@o|+|@n—1|; \ao|+|@n-2|>-+5 |ao 
|ao| 
= Taolmax{l, [an1], [@n—al>---> ail} 
Montel: a 
Zz 


l= ax taal 1+ a + laz] + + lal 


|ao| 
= T+ lao] + fail +--+ + lan- 


Carmichael and Mason. 


|ao] 
[1+ |ao|? + |a|? + + |an] 


l= ve 


i 
F 
F 


5.6 Matrix norms 


32. When the lower b 
bounds in Problems 2 
annulus {z: 7, s |z] 


319 
ounds in Problem 31 are combined with the upper 


7-30, it is possible to locate the zeroes of p(z) in an 
s7}. Asan example, consider 


1 1 
(er z”. +5 +e41 


which is the nth partial su 
tion e*. Show that all roo 


1 
F(Z) = — "4 
n: 


m of the power series for the exponential func- 
ts Zof f(z) =0 Satisfy the inequality 
3</2/<1+n! 


Apply Kakeya’s theorem t 


0 Z"f(1/z) to show that all 
satisfy |Z|>1, 


the roots actually 

33. Since p(A)=p(D7!4 
used in Problem 27 can be 
on the zeroes of the Polynomial p(z) i 
tionally convenient choice D=diag(p,, 
generalize Cauchy’s bound (5.6.40) to 


D) for any nonsin 


gular matrix D, the methods 
applied to D~'C( 


P)D to obtain other bounds 
n (5.6.38). Make the computa- 
P2, --., Dn) with all P;>0 and 


[|= max fja] 2, fay] Peat y Pazi laz Pr-2 | Pn 
, Py Py P Py Pn-| 
(5.6.44) 
P2 P2 1 
sJan- — +, üp- + =— ; 
| a di D3 | n 1| | 
which holds for any positive parameters Pis Po, 0065 Dn. 


34. If all the coefficients a, in (5.6.38) are nonzero, choose p, = 
PA/\Qn—K 41], A =2,3,...,n and deduce Kojima’s bound on the zeroes Zz 
of p(z) from (5.6.44): 


a 
a 


ay 
a3 
35. Now choose pp=rk, k= 1,2,. 
(5.6.44) implies the bound 


3 


re” 2 


|Z| < max flao], 2 


} (5.6.45) 


-A for some r>0 and show that 


[Z| =max{laoļr"™, Jair" =24r =, lagl pt 3g pl 
-] -1 
e; - slapi + 
jan 2lr+r Jan +r) (5.6.46) 
1 
S+ max {ja|r"4 for any r>0 
Po 0sksn-} 


36. If A € Mn, show that the Hermitian matrix 


> 0 A 
á=] i 9 |Ma 


320 Norms for vectors and matrices 


has the same spectral norm (||e||2) as A. Hint: Recall that I4le= 
pl ÂÂ) in general. 


37. If A,BeM,, if A is nonsingular, if B is singular, and if ||» || is any 
matrix norm, show that ||A—B||=1/||A7'||. Hint: B=A-(4-B)= 
A[I—A7(A—B)] is singular, so ||A~'(A—B)|| 21. What does this 
mean geometrically in M,? How closely can a nonsingular matrix be 
approximated by a singular matrix? See (7.4.1) for more information 


about this question. 


Further Readings. The bounds in the table in Problem 23 come from 
B. J. Stone, “Best Possible Ratios of Certain Matrix Norms,” Numerische 
Math. 4 (1962), 114-116, which also contains some additional bounds and 
references. For additional references and more discussion of the use of 
matrix norms to locate zeroes of polynomials (Problems 27-35), see 
M. Fujii and F. Kubo, “Operator Norms as Bounds for Roots of Alge- 
braic Equations,” Proc. Japan Acad. 49 (1973), 805-808. A more general 
discussion of the problem of determining bounds between induced norms 
[Theorem (5.6.18)] is in H. Schneider and W. G. Strang, “Comparison 
Theorems for Supremum Norms,” Numerische Math. 4 (1962), 15-20. 
There is a discussion of minimal matrix norms in [Wie]. 


5.7 Vector norms on matrices 


Although all the axioms for a vector norm are necessary for a useful 
notion of “size” for matrices, for some important applications the sub- 
multiplicativity axiom (4) for a matrix norm is not necessary. For example, 
the very useful limit (5.6.14) is actually true for a class of functions even 
more general than vector norms, not just for matrix norms. For this 
reason, we focus here on vector norms on matrices, that is, vector norms 
(which may not be submultiplicative) on the vector space Ma. Such norms 
are often called generalized matrix norms. We shall denote a generic 
vector norm on M, by |+| or by G(+) and we begin with some examples 
of vector norms on M, that may or may not be matrix norms. 


Example 1. If G(+) is a vector norm on M,, and if T, Se M, are non- 
singular, then 

Gr, s(A) = G(TAS), AeM, 
is a vector norm on Mp. Even if G(+) is a matrix norm, G7, s(+) need not 
be submultiplicative. 


(5.7.1) 


Exercise. Show that Gr, s5(+) in (5.7.1) is always a vector norm on Mp 


Exercise. Show that G 
norm. You may wish t 


Example 4. If A EM, isa 


x*x =1)} is called t 
R he field of values 


5.7 Vector norms on matrices 


Exercise. Let § = T=i 
matrix norm. 2h let G(s) = II 


321 
eo, and show that Gr s(*) is not a 


Exercise. Show that if G 


s (+) is i 
a matric neo ) is a matrix norm and T= s§~! 


, then Gr 5(+) is 


Example 2. Th 
- the Hadamard prod 

. : uct : 

4 ij] of the Same Size is just their of two matrices A 
EM, is a given matrix wit 


norm on M,,, then 


Gy (A) = G(HeA) 


Is a vector norm on Mf 
be submultiplicative. 


=[a;;] and B= 
n element-wise product AoB = [a b; ] If 
no zero entries and if G(»)i y veci 


Exercise. i 
Show that Gy (°) in (5.7.2) is always a vector norm 


f tercise, Show that Gr(*) in (5.7.2) ma 
copending on the choice of H, Conside 
nd the Hadamard multiplier matrices 


1 1l 
m=], | or m=|? | 
I 2 


You may wish to consider i 
a-lo 1 0 0 
| 0 o | B= l o) and AB (5.7.4) 
Notice that Gn, (C) < Gu, (C) for all CEM), 
Example 3. The function 
a b 
ah a] = flle+d)+|a—d+|6| +e) (5.7.5) 


1S a vector norm on Mhn 


(+) in (5.7.5) is 
(5.7. a vector norm but i 
© consider the matrices (5.7.4) not a matrix 


given matrix, the set F(A) = (x*Ax: xe C" and 


or numerical range of A, and the 


r(A)= max |x*Ax|= max lz! 
x*x <1 Zé F(A) ©.7.6) 


1s called the numerical radius of A 


322 Norms for vectors and matrices . 


Exercise. Show that the numerical radius r(A) is a vector norm on Mp. 
Hint: The positivity axiom (la) is the only hard part; see Section (4.1), 
Problem 6. The numerical radius is not a matrix norm, however. See 
Problem 10. 


Example 5. The le vector norm on Mn is 
|Aj.o= max |ajj| (5.7.7) 
i<i,jan 
We saw in Section (5.6) that |+].. is a vector norm on My but not a matrix 
norm, but z| e| is a matrix norm. 


The preceding examples demonstrate amply that there are indeed many 
vector norms on M, that are not matrix norms. Some of these norms, 
moreover, share some of the properties of matrix norms that follow from 
submultiplicativity, and some do not. But each vector norm on M,, is 
equivalent to any matrix norm (in the sense that they have the same con- 
vergent sequences); in fact, a somewhat more general result follows im- 
mediately from Theorem (5.4.4). 


5.7.8 Theorem. Let f bea pre-norm on M,, that is, a real-valued func- 
tion on M, that is positive, homogeneous, and continuous, and let ||e|| 
be a given matrix norm on My. Then there exist finite positive constants 


Cn and Cy such that 
CmllAll £ f(A) £ Cu(lAll (5.7.9) 


for all Ae Mq. In particular, these inequalities hold whenever f(*) isa 


vector norm on Ma. 

The equivalence (5.7.9) is often useful in extending facts about matrix 
norms to vector norms on matrices, or, more generally, to vector pre- 
norms on matrices. For example, the limit (5.6. 14) extends in this manner. 


5.7.10 Corollary. If f is a pre-norm on M,, then lim, ~ ol f(A") |" 
exists for all Ae M, and 


lim, + wl f(A") 1" = (A) 


for all Ae M,. In particular, this limit holds whenever (+) is a vector 
norm on M,,. 


Proof: Let ||-|| be a matrix norm on M, and consider the inequality 


CJA‘ < f(A < Cul‘ 


5.7 Vector norms on matrices 
which implies n 


1/k k 
Cm MA S LAANA < cyra 


for all k=1,2,3 l 
9 y s- But C4 >] Ca 
so we conclude that lim a LA A) ji Me i and lla" "5 (A) as k> œ 
as t ; 


he asserted value. 


the constant factor 


is . n tom it i 
no accident: Every vector norm ake it into a 


can be so modified. 


5.7.11 Theorem. For each v 
itive constant c(G) such that 
a matrix norm on M,, and if 


ector norm G(*) on 


Mn, there i i 
c(G) G(+) is a matrix n, c 1s a finite pos- 


x norm on M,. If ||«| is 


CmllAl|=G(A)<CyIJAl] forall Ae M,, 


m 


Moreover, there exists 


~ a matri . 
c(G) is sharp, so atrix norm for which this upper bound on 


Proof: For an 
i yc>0, the functi 

a matrix no on ||| =cG( °) sati . 

deduc rm except perhaps the submultiplicat; sfies all the axioms for 
es easily from cont Plicative axiom. However, öne 


inui 
of G(e) that uity of G(+) and compactness of the unit ball 
c(G)= max _ GAB) — _ 
Az0%B G(A)G(B) n, 3X G(AB) 


o G(A)=1=G(B 
is finite and positive. Then | 


G(AB) =c(G)G(A)G(B) and ¢(G)G(AB)<c 


for all A, Be M,,. Suppose iel 
given Inequalities between G( 


| (G)G(A)c(G)G(B) 
IS a Matrix norm on M 
+) and |||]. Then " 


and hence 


and assume the 


C 
lB] = Tz O(A)G(B) 


324 Norms for vectors and matrices 


Cu 
c(G)<s C2 


If we take for the matrix norm the particular choice || || =c(G)G(+), 
then Cy =c(G) and Cn =1/¢(G), so Cu/Cm=c(G). O 


Exercise. Show that if k=c(G), then kG(+) is a matrix norm. In par- 
ticular, show that Cy G(+)/CF, is always a matrix norm. 


Exercise. Deduce the result for vector norms in (5.7.10) directly from 
(5.7.11). 


One of the consequences of submultiplicativity of matrix norms is the 
fact that associated with every matrix norm on M, there is some com- 
patible vector norm on C”. It is a consequence of this that || A|| = p(A) 
for every matrix norm |[-[|; a vector norm on M, that satisfies this 
inequality for all A € M, is said to be spectrally dominant. It is interesting 
to observe that some vector norms on M, have compatible vector norms 
on C” and some do not. Among those that do not, some are spectrally 
dominant and some are not. And a vector norm on M,, can have a com- 
patible vector norm on C” without being a matrix norm. 


5.7.12 Definition. The vector norm |+| on C” is said to be compatible 
with the vector norm G(+) on M, if 
| Ax] = G(A) |x] 

for all xe C” and all Ae M,. The term consistent is sometimes used, and 
the vector norm ||| is sometimes said to be subordinate to the generalized 
matrix norm G(«). These ideas have already been touched upon in the 
previous section [e.g., (5.6.27), (5.6.30)]. We emphasize the observations 
relevant here. 


5.7.13 Theorem. If ||+{j is a matrix norm on M,, then there is some 
vector norm on C” that is compatible with it. 


Proof: If one defines |x| = ||[x 00... OJ||, then |Ax| = ||[Ax 0... Ol| = 
Aix 0... Ols PAP 0... 01 = A O 


We already know the converse. Theorem (5.6.2) says that if je] isa 
given vector norm on C”, then there is a matrix norm [the induced norm 
(5.6.1)] that is compatible with it. 


Exercise. Show that the compatible vector norm on C” guaranteed by 
(5.7.13) need not be unique. Indeed, |x| = ||[x x... x]|| also works, as 
does |x| = ||x*y|| for any nonzero vector y € c”. 


Since Ixl 
case follo 


o HW 
on C”? The condition 
ient we need a technic 


for all A), 4,,, 


Proof: By Coroll 


5.7 
Vector norms on matrices 


5.7.14 Theor 2 
: em, Let . i 
patible vector norm je] OC) be a 


gen on C”. Th a 
erally, en G(A)> (A) forall AcM, Man 
G(A1)G(43). re 
for all 4 1)G(Ap) --. G(Ax) = p(A, A, + Ay) 
1» A2,..., Age M, and all k= 1.9 (5.7.15) 


Proof: Suppose k = 2 


Ai Axx =x with |)| et xec” 


< be a no 
=p(A A3). Then nzero vector such that 


P(A} A2) |x] = |x] = |A) Ap x] = |A (42x)| 
il 


=G(A)) Ay x] < G(A)) G(Ap) [x] 


#0, we conclud 
e that p(A,A4 G(AI)G(A 
Ws In the same way by induction Cl NO(A). The general 


When do i 
eS a given 
vector norm 
on M 
H 


(5.7.15) is necessa 


ry; [ 
Y; to show that it is also suffi- 


t satisfies (5,7, 15) 


spectral no 
rm on 
ere is a finite posi- 


uch that 
G(A1)G(43) "*G(A,) >c 


Ma. Then th 


[A,A +- Ag, 
"s Ak E M, and all k=1,2 


ary (5.4.5) ther 
foral 4 


*04) with all 


max {o1, oz, =O such that A 


9 On} = [AA -+ Ay 142 +++ Ay = 


2. By (5.7.15) we 


G(V* 
)G(A})G(A}) "+ G(R) GOW) =p(V*A,A 
>- 


=p(Z) 


= (EI) 


Ba 


326 Norms for vectors and matrices 


1 
A A 
G(A;) G(A2) +++ G(Ax) = GW) GW) |4142 «ll 


hb? 
Ivl 
= b?’ || 4,42 + Agila 


|A A2 Alz 


z 


i is proved. U 
If we take c= b’, the assertion of the lemma is p 


5.7.17 Theorem. Let G(-) be a vector norm on M,,. There is a vector 
norm |s| on C” such that 


|Ax| <G(A)|x| for all xeC" and all AeM, 


if and only if 
G(A}) G(A2) «+» G(Agx) 2 p(A1A2 +: Ax) 
for all Aj, A2, ---, Ax E Mn and all k=1,2,.... 


i . For 
Proof: Necessity has already been proved in vom he Von M, Eo 
sufficiency, we shall show that there is a matrix nor i on that 
that G(A) = |A|] for all Ae Mn. Let |e] be a vector T A S] and 
is compatible with ||e|| [guaranteed to exist by Theor mae 1d 
let x € nd Ae M, be given. Then |Ax| < |A|] IxIs (4 ps 
we are ‘one if we can construct a matrix norm that is dominated by 
ae a given matrix A € M,, there are myriad ways to represen A as a 
product of matrices or as a sum of products of matrices. 


= z. EM, 
Ial =inf X G(An) = GUA): EAn = Aig, =A and all An, 3 


If SA Aix, = A, then by Lemma (5.7.16) and the triangle inequality 
ict Aik; =A, 
for the spectral norm we have 


¥ G(Aj) ++ GA) = ela "= Aix llr 


ae 


Aa Aix, |} =cllAlle 
2 
* 
ity i tructed function |le|| is posi- 

is inequality it follows that the cons post 
Five, Homogeneity of Iel] follows immediately from nomogeneity of 
G The triangle inequality and submultiplicativity for || +|| fo 
its definition as the infimum of sums of products. 


5.7 Vector norms on matrices 327 


Exercise. Provide the details for the argument that the function |||] con- 
structed in the previous theorem obeys the triangle inequality and is sub- 
multiplicative. Hint: If C=A+B or C= AB, then every representation 
of A and B (separately) as a sum of products yields a representation of 


C as a sum of products, but not all such representations of C arise in 
this way. 


Exercise. Consider the vector norm (5.7.5) on Mh. If it had a compatible 
vector norm |*| on C?, then show that 


fellik olfill=el0 JE] 
AE ololl<elt oe] 


which implies that 


loll=e[o alefi olllol 


0 1 0 0 
ls 
sola aleli o] 
Show that this is not correct and conclude that this vector norm G(+) on 
M) cannot have a compatible vector norm on C2. 


and 


Exercise (continued). Even though the vector norm (5.7.5) on M does 
not have any compatible vector norm on C’, show directly that it is spec- 
trally dominant. Discuss in light of Theorem (5.7.17). 


We now know useful necessary and sufficient conditions for a vector 
norm on M,, to have a compatible vector norm on C”. We also know that 
whenever one has a vector norm on C”, then the induced matrix norm 
(5.6.1) is a submultiplicative vector norm on Mh, that is compatible with 


it. When does a vector norm on C” have a compatible vector norm on Mn 
that is not submultiplicative? Always. 


5.7.18 Theorem. Let |+] be a given vector norm on C”. Then there is a 
vector norm G(+) on M, that is not a matrix norm and is such that 


Norms for vectors and matrices 


|Ax| = GCA) |x| 
for all xe C” and all Ae Mn. 


328 


i in diagonal 
utation matrix with a zero main diag 


‘ = 1: other- 
e pa] ifi=nand j=]; ot 
- for example, P = Li note the matrix norm on M, which is in 


Je]. Define G(e) on M, by 


Proof: Let Pe M, be any perm 


ise, pi =0. Let [lel 
ia 6.6.0) by the vector norm 


G(A)= AL + IPE max lau 


ll Ae M,, and 
G(A) = |All for a 

i tor norm on Mp, 

Clearly, G(+) 1s a vec 


|Ax} < IAI xls G) Ixl 


M, and all xe C”. But T 
_ “oer -GU = he PIIP = +IP N 


G(P)= |P] 
GP =P" 
T 
GPP’ > APIS ) ible with the given vector norm || 


is compat 

rae ut it ie Gon A hiplicative. E o 
ise. Let A = [aij] E€ Mn and consider the following modifica 

the maximum row sum matrix norm: 

\| Al] = |A + diag (an, 422, +--+» Ann) |\o 


oo: rd product nor . 
Show that this is a Hadama Show that this norm is compati 


hence is a vector norm on Mn S ate 
the vector norm |*|.onC. 


Te e LC 


. -alicat 
d show that this norm is not submultiplica 
an 


form (5.7.2) and 
m of the ble with 


and 


ive. 


Problems and let ye C” be a given nonzero 


1. Let G(-) be a vector norm on Ma 
vector. Show that the function 
|x| = G(xy*) | 
is a vector norm on C”. What is this ; 
y=[l,1,.,1]7 or y=[1,0,0,.-., 


when 
T 


5.7 Vector norms on matrices 329 


2. Let |+] be any vector norm on Mn, let A € M, be given, and let e > 0 
be given. Show that there exists some K = K (e, A) >0 such that 


[0(A)—e]* <]A*| <[p(A) +e] 
for all kK>K. 


3. Let |e] be any vector norm on M,, and let AEM, be given. (a) Use 
Problem 2 to show that if p(A)<1, then |A“| +0 as k> œ. At what 
rate? (b) Conversely, if | A‘ |—=0 as k + oo, show that P(A) <1. Hint: 
Consider |.A*[x... x]| if Ax =x and x0. (c) What can you say about 
convergence of power series of matrices using vector norms? 


4. Let G(*) be a given vector norm on M,,, and define the function 
G’: M, >R by 
G'(B)= max G(BA) 
G(A)=1 
Show that G’(«) is always a matrix norm on M,a. Show that G’'()=1 
always. If G(/) = 1, show that G"(B) = G(B) for all BeM,,. 


The next four problems extend Problem 4. 


5. If G(+) is a matrix norm on M,, show that G'(B) = G(B) for all 
Be M,, and if G(/) =1, then G'(*)=G(e). 


6. Show that G"(*)=G(«) always. 


7. If G(+) is a vector norm on M, such that G(/) = 1, show that G(+) isa 
matrix norm if and only if G(B)< G(B) for all BeM,,. 


8. Show that one could reverse the order in which A and B appear in the 
definition of G’(*) in Problem 4 and thereby obtain another matrix norm. 
Show with an example that this other norm need not be equal to G’(e), 


9. Show that the set of all vector seminorms on C” that are compatible 
with a given vector norm on M, is a convex set - it is in fact a cone. 


10. Show that the numerical radius r(+) is not a matrix norm on M, by 
considering the matrices (5.7.4) and comparing r(AB) with r(A)r(B). 


11. The inequality || jj = p(A) in Theorem (5.6.9) follows from the sub- 
multiplicativity axiom (4) for a matrix norm ||+||. But it is possible for a 
vector norm on M, to satisfy this inequality (i.e., to be spectrally domi- 
nant) without being a matrix norm. Show that r(A) = p(A) for every 
AéM,,. Show more generally that o(A)C {x*Axix*x = 1}. 


12. Show that the vector norm lelo on M, cannot have any compatible 
vector norm on C”, Hint: Consider [Jalo and p(J,). However, show 


330 Norms for vectors and matrices 


that nle is a matrix norm on M,, and hence it has a compatible vector 


norm on C”. 

13. For A = [laij] € Mm,n denote the transpose of the ith row of 
A by rj(A) = [ai di2, -> Gin) and the jth column of A by ¢j(A) = 
[aij a2js +++ amj)", and suppose that [ela and |+|s are vector norms on 
C” and C”, respectively. Then define Gg,a:Mmn7R 


Gg a (A) = JE (Ae (Ale i Patala] Te 
Similarly, define G%® : Mm, n > R by 
GPA) = Jie tale 1Ale > Intl] To 


Show that Gg,a(°) and G%P(e) are each vector norms on Minn, but 
that G%*(-) is not necessarily the same as Gy,g(*). Note that these 
are natural ways to define vector norms on the space of rectangular 


matrices. 


14. Compare Gg,.(*) of Problem 13 to the norms |]el]a,g defined in 
Problem 4 of Section (5.6), and show by example that even when m=n 
(and even when Vela = l-l) Gg, a(*) need not be a matrix norm on Mn. 


15. In Problem 13, when Jla= lh = \e|g, what norm is Gg, a(*)? How 


about G%8(+)? 
16. In Problem 13, when |+la=Ieh and Jeļg= [elo what norm is 


Gg, a(*)? How about G?*“(°)? What about Ga, gh”) and G%8(+)? 
17. If G(-) is a vector norm on M,, the spectral characteristic of G(+) is 
defined as 
m(G)= max p(A) 
G(A)s1 

Show that G(+) is spectrally dominant if and only if m(G) <1, and show 
that any vector norm on M,, may be converted into a spectrally dominant 
norm via multiplication by a constant - with the minimum constant 
necessary being m(G). A norm G(+) on M, is called minimally spectrally 
dominant if m(G) =1. 

18. Show that any induced matrix norm is minimally spectrally domi- 
nant, as defined in Problem 17. Show that there are norms that are mini- 
mally spectrally dominant but are not induced. Show that the numerical 
radius r(A) is minimally spectrally dominant. 

19. Show that the spectral characteristic is a convex function on the 
cone of vector norms on M,, and therefore that the set of all spectrally 

dominant vector norms on M,, is convex. 


5.7 Vector norms on matrices 
331 


20. Sho 

only if ow phat a vector norm G(+) on M, is spectrally dominant if and 
d e M,, there is a constant y4 [depending onl 

and A] such that for all integers k >0 y upon G(+) 


G(A*) <y4G(A)* 


21. (a) Show that the numeri i 
erical radius r(¢) satisfies r(A)= = 
la h Whenever A is normal, but that r(A)< |A| in RA Gon 
ma Oey ae M, such that r(4)< [JA]. Hint: Show that 
J*AU) = er Ue M, is unit i 
unitarily diagonalizable. Then observe that and use the fact that A fs 


r(A) = max |x*Ax| = max |Ax|2 |x]2= ||All2 


in general. (b) Show that r(A) =r(A* 
= f 
||All2 = 2r(A) for all A € M, as follows: write. Ae Mn. (©) Show that 


A=(A+A*)/24+(A—A*)/2 =A, +A) 
and observe that A; and A, are normal. Now show that 

lAl s Ail + WA alla = (A) +r(42) £ r(4)+r(4*)=2r(A) 
(d) Show that the bounds 

2 |All2sr(A)s All, (i) 
proved © 8) and [2], by considering suitable n-by-n versions of 


22. Use the inequalities in (d 
: ) of Problem 21 and the bo 
und 
given in Theorem (5.7.11) to show that the function 4r(«) Sa matrix 


norm on M,,. Show that c(r) = 4 by considering A = lo | , A*, and AA* 


23. Deduce from (i) in Problem 21(d) and the inequality 
1 
Va |Al2 s Alla s |Al2 (ii) 
proved in Problem 23, Section (5.6), that 


1 
Wa |A|2<7r(A) S|Al2 (iii) 
for all A e Mp and sh i 
[o f Aan - show that the upper bound is sharp. Verify that A = 
a i= are examples of equality in the lower bounds of (i) and 

> ey and that A= [o o] is an example of equality in the up- 
per bounds of (i) and (ii). Explain why the upper bound in (iii) must 


332 Norms for vectors and matrices 


therefore, be sharp, and give an example of a case of equality. The lower 
bound in (iii), however, is not sharp. Why is there a finite maximal pos- 
itive constant c, such that c,||Al2 <r(A) for all Ae Mp? Actually, C, = 
(2n)! for even n and cp = (2n—1)7"/? for odd n. For even n the cases 
of equality are the matrices unitarily similar to a direct sum of matrices 


of the form r(A)| 9 ol; an additional single 1-by-1 direct summand [a], 


|a|=r(A), must be included when n is odd. 


24. Show that [A, B] = tr AB* defines an inner product on M, and that 
the /, norm on M; is derived from [+, +]; that is, Al, = [A, A]' for all 
AéM,, Show that if X =xx* is a Hermitian rank 1 matrix, then |X h= 
|x|3. Show that the field of values of a given matrix A € M, is just the set 
of projections (in the inner product [+, +]) of A onto the set of unit norm 
rank 1 Hermitian matrices, and that r(A) =max{|[A, X]|: X is a rank 1 
Hermitian matrix and |X], =1}. Use the Cauchy-Schwarz inequality to 


show that r(A) <|Alo.- 


25. The numerical radius is related to a natural approximation problem. 
Let Ae M, be given, and suppose we wish to approximate A as well as 
possible in the sense of least squares by a scalar multiple of a Hermitian 
matrix of rank 1. If we write X =cxx", ce C, |x|, =1, then show that 


|A—X =A -cxx" [32 JAD — 21 cL, xx*]|+ lel? 


is minimized when c =[A, x*] and £ is a unit vector for which the max- 
imum in (5.7.6) is achieved. Conclude that if |A—cX|, is minimum 


among all scalars c and all rank 1 Hermitian matrices X with |X|2=1, 
then |c|=r(A). 


26. The preceding two problems suggest a natural generalization of the 
numerical radius and the field of values. Let # C M, be a nonempty set of 
matrices such that 
(a) If Xe, then aXe® for all aeC; 
(b) [A,X] —tr AX*=0 for all Xe @ if and only if A =0; and 
(© ® is a closed set. 
If A e Mp, define 
o(A) = max |[A, X]|= max Itr AX*| 
XeE® xe® 
|Xi2s1 iX|2=1 
Show that (+) is well defined, is a vector norm on M,,, and satisfies 
\6(A)| S| Al2. Show that for each A € M, there is some X4 € ® such that 
|X4l2=1and $(A) =|[A, X4]! 
Consider the problem of approximating a given AeM, by matrices 
from ®. That is, find some X € ® such that |A- X |z is minimized. Show 


5.7 Vector norms on matrices 3 
33 


iL. ih an oximation is given by the matrix #(A)X4, where ¢(A) = 
sA Alls at the error in the approximation i p 
of Ab 
the sharp bound |A—X|3=|A[3—|[4, Xa]|?20. any Xe? has 
matrices ‘th t if ẹ is the set of all scalar multiples of rank 1 Hermitian 
multiples ee) =r(A). The case in which is the set of all scalar 
ry matrices is discussed in Exampl : . 
$(A) is the average of the sin mple (7.4.6); in this case 
. ` gular values of A. Another interesti 
. ; esti 
n waen Pis the set of all singular matrices, discussed in Example 74 D 
IS case (4A) is the smallest singular value of A. Other interestin 
cases are t e set of all scalar multiples of positive definite, Hermitian or 
a auces of a given rank, or the set of all scalar multiples of all 
unitarily similar to a given matrix. In 
cases, the analog of the field of values is the set {[A Xx ]: pi ae o these 


27. Even though the numeri ; 
; erical radius r(A) is not a matri j 
satisfy the power inequality r(A”) < [r(4)]” for all m - m, it does 
Yo Show tk this with the following steps: » 25 and alll 
a ow that it is suffici ea 
for all m=1,2.... cient to prove that if r(A) <1, then r(4”) <1 
and let ton 2 be a given positive integer, fixed for the rest of the argument 
that {w k j=fe het denote the set of mth roots of unity Notice 
; k} is a finite multiplicative group and that {w;w,}" ym e 
or each j=1,2,..., m. jWkÌk=1 = {Wi}]k=1 
(b) Observe that 


m 
1-z"= II (1—w;,z) 


and show that 


] m m 
p(z)= m 2, H (l—w,z)=1 forall zec 
kj 
Hint: . . . 
int: Notice that p(z) is a polynomial of degree at most m—1, and that 
1 # l-z” 
pz)=— 5 


M j=l l- w;z 


so that p(z)=p(wiız)= -= = 
stant = p(0) = 1. l P(Wmz) for all ze C. Hence p(z) =con- 


(c) Show that 
m 


I-A” = Į — 1 #2 m 

H w4) and I= II (/-w,A) 
fel k=l 

key 


(d) Let xe C” itv l 
) e C” be any unit vector, |x!.=1, and let A e€ M,,. Verify that 


d 


334 Norms for vectors and matrices 


1—x*A"x =x*(I—-A™)x = (Ix)*(I-A™)x 


= E $ TI a-w] | Hw. Ads] 
k=l 


M j= k=] 
kj 
1 2 m 
=— $ zž[(/-w;A)z;l,  z;= I Ħ-w:A)x 
mM j= k=1 
kj 
7 Ble) Gen) 
m > t| "N [zjle lz;l2 


(e) Now replace A by e”4A in the identity in (d) to get 


; 1 Z ; z; \* Zj 
pM yk AMy m 42 1-ew( ) a( )| 
tem z 2, FB | \ Teh) zsh 

2; #0 


for any real 6. Now suppose that r(A) <1, and show that the real part 
of the right-hand side of this identity is nonnegative for any 0ER, and 
deduce that the real part of the left-hand side must also be nonnegative 
for all 0e R. Since 6 is arbitrary, argue that this implies that |x*A”x|<1 
and hence that r(A”) <1. 


28. Even though the numerical radius satisfies the power inequality 
r(A") <r(A)”, it is not always true that r(A**”) <r(A*)r(A”). Verify 
this by considering A = J4(0) (the 4-by-4 Jordan block matrix), k=1, 
and m=2. Hint: Use the arithmetic-geometric mean inequality to show 
that r(A?) =r(A?) = 4 and the Cauchy-Schwarz inequality to show that 
r(A) <1. 


29. Is there a sensible notion of “minimal vector norm” on M,, analo- 
gous to the concept of minimal matrix norm (5.6.29)? 


Further Readings. For more discussion of inequalities involving the nu- 
merical radius see M. Goldberg and E. Tadmor, “On the Numerical Radi- 
us and Its Applications,” Lin. Alg. Appl. 42 (1982), 263-284. The proof 
of the power inequality for the numerical radius in Problem 27 is taken 
from C. Pearcy, “An Elementary Proof of the Power Inequality for the 
Numerical Radius,” Michigan Math. J. 13 (1966), 289-291. Some of the 
material of this section was developed by C. R. Johnson in “Multiplica- 
tivity and Compatibility of Generalized Matrix Norms,” Linear Alg. Ap- 
pl. 16 (1977), 25-37, “Locally Compatible Generalized Matrix Norms,” 
Numer. Math. 27 (1977), 391-394, and “Power Inequalities and Spectral 


5.8 Errors in inverses and solutions 335 


Dominance of Generalized Matrix Norms,” Linear Alg. Appl. 28 (1979), 
117-130, where further results may be found. 


5.8 Errors in inverses and solutions of linear systems 


As an application of matrix and vector norms, we consider the problem 
of estimating the error made in computing the inverse of a matrix and the 
solution to a system of linear equations. 

If a nonsingular matrix A e M, is given, we may imagine that we can 
compute the inverse matrix A~' exactly, but if the computations are per- 
formed on a digital computer with a finite-length machine word, there are 
inevitable and unavoidable errors of rounding and truncation. Further- 
more, even if all the computations could be performed with perfect accu- 
racy, it could be that the elements of the matrix A are the result of some 
experiment or of some calculation that is subject to errors, and therefore 
they may not be known with perfect accuracy. How do errors in the com- 
putation and errors in the data affect the computed matrix inverse? 

It turns out for many common algorithms that round-off errors in 
the computations can be modeled in the same way as errors in the data. 
That is, let us suppose A e M, is a given nonsingular matrix, and we wish 
to compute A7', but what we actually compute is (A+E)7~', where 
EeéeM, is “small” enough so that A+£ is invertible. Then the error is 
A'-(A+E)7!=A™'-U+A7'E) A"! If p(A7'E) <1, then A+E 
will be invertible and we can write ([+A'E)7' as a power series in 
A'E. This gives 


A7!—-(A+E) =A- ¥ (-1)'(4 By AT! 
k=0 


= $ (-1)* tua le)‘ 47! 
k=] 


Thus, we have an exact formula for the error 


A'~(A+E)'= $ (=+ (ATE) A! if p(A'E)<1 (5.8.1) 
k=l 


Now suppose that |j-|| is a given matrix norm, and assume that 
||A~'E|| <1, so that, in particular, p(A~'E) <1 and (5.8.1) holds. Then 


||47'-(A+E) "|= 


5 Danera | 
k=] 


IAE] yy 
ka] 1-||A“'E]| |A | 


336 Norms for vectors and matrices 


and we conclude that an upper bound on the relative error made in com- 
puting the inverse is 


MAT-G+E)"h ATEN ey 
IAI sija zj À I4 Els! 6.8.2) 


If we assume, in addition, that Æ is “small” enough so that |E || < 
1/|A7']|, then p(A7'E) < | A'E] < |A! [Ell <1 and we have the 
estimate 


JA -4+E) | IAE] __ FATWA IAD 
lam" ~ 1A EY 1-A AEWA 
The quantity 
_ [ AT WAI if A is nonsingular 
K(A) =f! if A is singular (5.8.3) 


is called the condition number for matrix inversion with respect to the 
matrix norm ||+|j. Notice that «(A) = |47" | [Al] = A'A] = [If = 1 for 
any matrix norm. 

Using this notation, we have the estimate 
JAN (A+E) K(A) IEI -1 
a Sodea a $ AUA Ii 


(5.8.4) 


which bounds the relative error in the inverse in terms of the relative 
error in the data. For ||E || small, the right-hand side is of the order of 
K(A)||E||/||A |], so we have good reason to believe that the relative error 
in the inverse is of the same order as the relative error in the data, pro- 
vided that «(A) is not large. For purposes of inversion, we say that A is il 
conditioned or poorly conditioned (with respect to the matrix norm |j» ||) 
if x(A) is large; if x(A) is small (near 1), we say that A is well conditioned 
(with respect to the matrix norm ||e||); if «(A) =1, we say that A is per- 
fectly conditioned (with respect to the matrix norm |«|)). 

There is an interesting geometric characterization of the condition 
number in the common case that the norm used is the spectral norm. Let 
6(A) denote the least angle between the vectors Ax and Ay as x and y 
range over all pairs of orthonormal vectors. Using the spectral norm, 
x(A) =cot[0(A)/2]. Thus, if A is unitary, 0(A) = 2/2 and cot (1/4) =1= 
(A). If A is “nearly singular,” then there is some orthonormal pair x, y 
such that Ax is “nearly parallel” to Ay, O(A) will be small, and x(A)= 
cot[@(A)/2] will be large. For more details, see Example (7.4.26). 


Exercise. Show that if A e M, is invertible, then «(A) =x(A7!). 


5.8 Errors in inverses and solutions 337 


Exercise. Show that if U, Ve M, are unitary and if the spectral norm (or 
any other unitarily invariant norm) is used, then K(A) = k(UA) = K(AV) = 
k(UAV). Thus, unitary transformations of a given matrix do not make it 
any more ill conditioned than it is already. This observation underlies 
many stable numerical linear algebra algorithms. 


Exercise. Show that «(AB) < «(A) x(B) always. Is «(+) a matrix or vector 
norm on M,,? 


These same considerations can be used to give a priori bounds on 
the accuracy of a solution to a system of linear equations. Suppose one 
wishes to solve 

Ax = b, AeéeM,, bec” (5.8.5) 
but because of computational errors or uncertainty in the data one actu- 
ally solves 

(A+E)X=b, A, EeM,, bec” (5.8.6) 


What can we say about the error x— $? 
If E is “small” enough so that p(A~'E) <1, then by (5.8.1) we have 


x-%=A™'b-(A+E)"'b=[A7!-(A4+E)7']b 
= $ (71) "(4B A b= F(t) hy 
k=l k=] 


If |+ || is a matrix norm such that || 47'E || <1, and if +] is a compatible 
vector norm, then an upper bound on the norm of the error is 


-eis È be e 


In terms of relative errors, this says that 
xa) fateh 

Ix} ~ 1-]A™'E] 

and if |e] is a vector norm that is compatible with the matrix norm lel. 

Notice that this is the same as the upper bound (5.8.2) on the relative 

error of the inverse, and it is independent of the right-hand-side b of the 
system of linear equations. 

In terms of the condition number of A, the same argument used to 


derive (5.8.4) shows that the relative error in the solution to (5.8.5) has 
the bound 


x-êl KA) IEJ 


if ATE] <1 (5.8.7) 


if [A IHEI<] (5.8.8) 


338 Norms for vectors and matrices 


and if the vector norm |+{ is compatible with the matrix norm ||e||. 
Whatever algorithm is used to solve the linear equations (5.8.5), the rela- 
tive error in the solution has the same bound as the relative error in the 
inverse of the matrix of coefficients. 

It may be that the ideal system of linear equations (5.8.5) is in practice 
subject to uncertainty about the elements of the right-hand-side b as well 
as about the elements of the coefficient matrix A. Thus, we may wish to 
replace (5.8.6) with 


(A+E)x=b+e (5.8.9) 


where Ee M, and ee C” are thought of as “small” errors in the data. 
Using the same methods, one finds (if b # 0) that a bound for the rela- 
tive error between the solutions to (5.8.5) and (5.8.9) is given by 


|x—x| K(A) EI K(A) lel 
A 


ix) = 1—«(A)(EW/AD Fl” 1-«(4) (VENI Ti 
(5.8.10) 


under the same hypotheses as for (5.8.8). Thus, the relative error bound 
has two terms, one for the relative error in the coefficients A and one for 
the relative error in the right-hand-side b. The condition number «(A) 
again plays a crucial role in determining the sensitivity of the bound on 
the solution error to errors in the data. 

All our estimates so far have been a priori bounds on the error; they 
do not involve the computed solution or any quantity derived from. 
it. Suppose, however, that some computed “solution” x to the system 
(5.8.5) has been found. It may not be the case that AX =b exactly, but 
from the residual vector r=b—AX we can obtain an estimate of how 
close is to the true solution x. Since A™'r = A7'[b— AX] = A'b- $= 
x—%, we have the straightforward bound |x—%|<|A7'r|. If Ije] is a 
matrix norm that is compatible with the vector norm |e], we have |b] = 
|Ax| < JA] [x], or 1< |All ixI/[b] if 640, and hence 


IAI x] -1 _ ty IF 
To IATH = IANA ibi [xl 


Thus, if b <0, the relative error between the computed solution X (such 
that AX = b — r) and the true solution x (such that Ax = b) has the bound 


lx- =A“ Iris 


|x—-x| Ir 
L < AM (5.8.11) 

Ix] [>| 
where we assume that the matrix norm used to compute the condition 
number x(A) is compatible with the vector norm |e]. For a well-condi- 
tioned problem, the relative error in the solution is not (much) worse 


5.8 Errors in inverses and solutions 339 


than the relative size of the residual; for an ill-conditioned problem, even 
a computed solution that yields a small residual may still be very far from 
the true solution. 

As a final remark about norm estimates of errors, we note that the 
upper bounds derived in this section are just that - upper bounds. The 
upper bound may be large and the actual error may nevertheless be 
small. A common characteristic of such bounds is their conservatism: 
They give bounds on the error that are unduly pessimistic for many 
problems. However, if a matrix A of moderate size with moderate-size 
elements has a large condition number, then A~! must have some large 
entries, and it is well to exercise great caution for the following reason. 

If Ax = band if we set C= [¢;;] = A7', then differentiating the identity 
x= Cb with respect to the entry b; gives the identities 


Ox; 

TT = Chi, 

ab, 
Furthermore, if we consider C=A7' as a function of A, then its 


entries are just rational functions of the entries of A and hence are differ- 
entiable. The identity CA =I means that for all i, q=1,...,m we have 


i, j=1,2,...,7 (5.8.12) 


n 
È Cipapq = Sig 


p=l 
and hence 
n [âc n 
—P. ðc; 
» Ang tò Cin {om p _ 
p=l E pa! pojk | | > 0a jx pq + 69K Cij 0 
or 
0C; 
ment 4 
ape = — Â C; = 
2 Oa jx pk qk“ij» i,J,kK=1,...,Aa 


Now differentiate the identity x = Cb with respect to a;, to obtain 


Ox; ae; n "ac; 
a = a bp= X 


; Ang X 
OG jx pel a jx p=l q=] Oa jx Pad 


n 
=y/> Hip | _¥ 
j pa | Xq = z [—ôgk Cij Xg = —Cij Xk 
q= 


q=1 | p=1 ajk 
which is the identity 
Ox; n 
Ban =u D Ckpbp (5.8.13) 


Thus, (5.8.12) and (5.8.13) together warn us that if C=A™~! has any 
relatively large entries, then some entry of the solution x may have a 


Norms for vectors and matrices 


340 
turbations in some of the entries 


large and unavoidable sensitivity to per 
of b and A. 


Problems 


1. Show that th 
matrix with respect to the s 


K(A) = p(A)p(A~') = 


the eigenvalues an 


e condition number for inversion of a nonsingular normal 


pectral norm is 
| max (A )/ min (A)| 


2. Compute d the inverse of the matrix 


1 -l 
A= , e>0 
—| Ite 


e= 0, the ratio of the largest to smallest eigenvalues of Ais 
-1 Use Problem 1 to conclude that «(A) = O(e~') with 
Argue that «(A) = O(e7') with respect to 
ditioned with respect to inversion. Use the 
-1) for any norm. 


Show that, as 
of the order ofe 
respect to the spectral norm. 
any norm, SO that A is ill con 
explicit form of A` to verify that «(A) = Ole 


pute the eigenvalues and the inverse of the matrix 


1 -l 
B= ; e>0 
1 —l+e 


— 0, the ratio of the largest to sm 


Show, however, that 
d hence B is ill conditioned with respect to 


hat the ratio of largest to smallest modulus eige 
dition number for non-normal matrices. 


4. The condition number «(A) for inversion depends on the matrix 
norm used, but show that all condition numbers are equivalent in the: 
sense that if x(A) = An Wall Alle and xg = Ate Alle» then there | 
exist finite positive constants Cm and Cm such that : 
CinKa(A) = Kg(A) = Cm kalA) for all AE Mn 
erfectly conditioned for inver- 
| norm. If the l, norm is used, 
y unitary matrix Ue M, is n. 


3. Com 


k 


allest eigenvalues of Bis | 
«(B) = O(e7') with respect to any | 


Show that, as € 
of the order 1. 
matrix norm, an 
«2 0. Conclude t 
need not be a con 


inversion as © 
nvalues | 


; 


y unitary matrix U is p 
h respect to the spectra 
number (U) of ever 


5. Show that ever 
sion [k(U) = 1] wit 
however, the condition 
6. Show that K (A) = | MA) |max / IMA) min for any n 
and any matrix norm (max and min refer in this case to the modulus of 
the eigenvalue). Thus, if this ratio of eigenvalues is large, the matrix must 
be ill conditioned for inversion, whether or not it is normal. However, 


Problem 3 shows that if the matrix is not normal, it can 


even if this ratio is not large. 


onsingular A € M, 


beill conditioned 


7. Provide the detai 
i tails for the generalizati 
to which the former bound reduces when oO. of the bound (5.8.8) 


8. Let x be i 

a unit vector in C” 
Hermitian matrix wi and let \>0. Sh 

at 1 . OW th = . 
Show that «(4 ‘a un eigenvalues 1 (with ‘multiplicity w= H +hxx* isa 
simple method to prod with respect to the spectral norm) so thi ; : 7%. 

os an invertibl ix wi ? S gives a 

arbitrarily large condition number How? with bounded entries and 


9. Let B be the matrix in P 

Bx = Toi roblem 3, and . , 
Foe e on xal, aa aene inen equations 
Irl/lb] = 0") 0 ae. . Show that the relative error in fhe vecia solu- 
lx—8]/[x] = O(e7”?) se? 0, but that the relative error in the resid aal is 
from approximate solution as e—>0. Thus, small residuals ns unon is 
the bound (5.8.11). s that have large errors. Explain in Tight of 


10. If det A is sm 
all 
A= \IeM,. (or large), must K(A) be large? Hint: Consid 
. er 


11. The result of 
(5.8.4) is, i 
because the hypothesi , In general, weak 
t ~ , er tha 
JATE] <1. porosis A'I lE] <1 is more restrictive th of (5.8.2) 
eless, even if the stronger hypothesis han having 
is satisfied, 


(5.8.2) may still gi 
; give a b 
with E=e€A, 0<e<l. etter upper bound than (5.8.4). Illustrate this 


12. Find the anal 
ogs of the bound 
(5.8.5) and (5.8.6) are replaced by s 6.8.7) and (5.8.8) if the equations 


AX=B and (A+E)X=B 


; ns i 


B =F; does this h 
; elp to “e in” 
(5.8.7) are the same? xplain” why the upper bounds in (5.8.2) and 


(5.8.1), which requi 

D quires th =i 

both invertible, then at (A7 E)<1. Show that if Aand A+Ea 
re 


AT- (A+E)`' s A'I] 


for any matrix norm ||e|| regardl KA+E) IE] 
-l 7 . regar . S 
A™'—(A+E)~'=(A+E)"'EA less of the size of AT'E. Hint: Show that 


14. Perhaps th 
ao e most commonly ci 
matrix is the Hi Í y cited example of ; wos 
Show that the pem matrix Hn = [hi] € M, defined by h 'ikeonditioned 
on number of H, with respect to the st ys). 
pectral norm 


342 Norms for vectors and matrices 


asymptotically equal to e”, where the constant c is approximately 3.5, 
and it is also a fact that p(H,)=7+O[1/(logn)] as n= æ. We have 
x(H3)~ 5x10", x(H6)~ 1.510’, and «(Hg)~1.5x10'°. Explain why 
H,, is so poorly conditioned even though the elements of H, are all uni- 
formly bounded and p(H,,) is not large. 


15. If the spectral norm is used, show that «(A*A) = «(AA*) =[«(A)]’. 
Explain why the problem of solving A*Ax = y may be intrinsically less 
tractable numerically than the problem of solving Ax =z. 


16. Let Ae M, be nonsingular. Use the inequality in Problem 37 of Sec- 
tion (5.6) to show that «(A)2||Al|/||A—B|| for any singular Be M,,. 
Here, |j|| is any matrix norm and x() is the associated condition num- 
ber. This lower bound can be useful in showing that a given matrix A is ill 
conditioned. 


17. Let A= [a;;] € M, be an upper triangular matrix with all a;; # 0. Use 
Problem 16 to show that the condition number with respect to the max- 
imum row sum norm has the lower bound 


Allo 
K(A) = — lal 
min; <; sn lä] 

Further Reading. The problem of finding a priori bounds for errors in 
solving linear systems of equations has been a central one in numerical 
linear algebra; see [Ste]. 


CHAPTER 6 


Location and perturbation of eigenvalues 


6.0 Introduction 


The eigenvalues of a diagonal matrix are very easy to locate, and the 
eigenvalues of a matrix are continuous functions of the entries, so it is 
natural to ask whether one can say anything useful about the eigenvalues 
of a matrix whose off-diagonal elements are “small” relative to the main 
diagonal entries. Such matrices do arise in practice; large systems of 
linear equations resulting from numerical discretization of boundary 
value problems for elliptic partial differential equations can be of this 
form. 

In some differential equations problems involving the long-term sta- 
bility of an oscillating system, one is sometimes interested in showing 
that the eigenvalues {\;} of a matrix all lie in the left half-plane, that is, 
that Re(\;) <0. And sometimes in statistics or numerical analysis one 
needs to show that a Hermitian matrix is positive definite, that is, that all 
A > 0. 

Sometimes one wants to locate the eigenvalues of a matrix in a 
bounded set that is easily characterized. We know that all the eigenvalues 
of a matrix A are located in a disc in the complex plane centered at the 
origin and having radius |A|, where ||+|| is any matrix norm. But can 
one do better than this by more precisely locating regions that must either 
include or exclude the eigenvalues? We shall see that one can. 

Finally, suppose that one knows exactly the eigenvalues of the matrix 
A, but that A is subjected to a perturbation A ~ A+ E. How do the eigen- 
values change? Because the eigenvalues are continuous functions of the 
entries of A, we have reason to believe that if the perturbation matrix E 


343 


Location and perturbation of eigenvalues | 
ues should not change too drastically. 
now how small is “small” in each case. 
Section (5.8), where we discussed 
f linear equations to perturba- 


344 


is small enough, then the eigenval 
But one needs precise bounds to k 
The basic issue here is the same as in Á 
the sensitivity of the solution of a system 


tions in the data. 


6.1 Gerggorin discs 


If Ae Mn, we can always write 
just the main diagonal part of 
set A= D +eB for any € EC, 


= di veey Ann) IS 
A=D+B, where D = diag (a1, 
A. and B has a zero main diagonal. If we 
then Ao =D and A; = A. The eigenvalues 


: i ..., Ann in the 
Ao = D are easy to locate: They are just the points a enough then 
o slex plane. We have reason to suspect that if 1 À -shborhoods of the 
the ai envalues of A, will be located in some smal val d the GerSgorin 
; ints a ann: The following theorem Oe hore are indeed some 
PO ation precise: Ther 
: em) makes this observ . aranteed to 
dse a mputed discs centered at the points 4jj that are gu 
easi 


contain the eigenvalues. 


t 
6.1.1 Theorem (Gerggorin). Let A= faij]eMn, and le 
n * 
R{(A) = È \ajj|, isign 
jzi | 
lues of A 
denote the deleted absolute row sums of A. Then all the eigenvalue 


are located in the union of n discs 


U (ze C: |z—aii| = RMA) = GCA) 
i=l 
Furthermore, if a union of k of thes 
that is disjoint from all the remaining 
k eigenvalues of A in this region. 


hx, x=[x;] #0 


i e Ax = 
Let \ be an eigenvalue of A, and suppos le “fo 


roel hat has largest absolute value, say 


is an element of x t 
alli L 2,..., n, and xp #0. Then the assumption t 
all i =1,2,.. 7, 


n 
which is equivalent to 


n 
Xp(\— app) = © apjXj 
p A 


j#pP 


(6.1.2) | 


e n discs forms a connected region 1 
n—k discs, then there are precisely | 


hat Ax = \x means thai 


PS OEY ION CEL ISIS SI IIIT OI ET yr 


6.1 Geršgorin discs 


But then the triangle inequality permits us to conclude that 


n n n 
[Xp] |\— app] = d apj Xj] = D» la;jx;|= % la;llx;| 
j= j= j= 
J#p J#p J#p 
n 
<|xp] 2 la;|=|xp|R; 
j= 
J#p 


Thus, [A — app] S Rp for some p; that is, à lies in a closed disc around 
py, Of radius R;. Since we do not know which p is appropriate to each 
eigenvalue \ (unless we know the associated eigenvector, in which case we 
would know ) exactly and would not be interested in locating it), we can 
only conclude that A lies in the union of all such discs, which is the region 
(6.12). 

In order to prove the second assertion of the theorem, write A= D+ 8B, 
where D = diag (ay, ...,@,,) and set A, = D+eB for ce [0,1]. Notice that 
Rj(A,) = Ri (eB) = €R{(A). For convenience, suppose the first k discs 

k 
Uk EC: |z~—a;;| = Ry} 
i< 
form a connected region G, that is disjoint from the complementary re- 
gion Gx consisting of the n—k remaining discs, that is, Gg = G(A)\G,. 
Notice that the union of the first k discs of A, 


k 
G,(€) = Uk eC: |z- a;l] < Rj(A,) = €R/(A)} 
j= 
is contained in the connected set G, = G,(1) for all e e [0, 1], but that 
G,(€) may not itself be a connected set for all such e. Furthermore, none 
of the complementary regions Gf(e) = G,,(€)\G,(e) ever intersect G,. 
For each i =1,..., k, consider the eigenvalues \j(Ao) = aj; and d;(A,), €> 
0. Because the eigenvalues are continuous functions of the entries of A 
(see Appendix D), and because all \;(A,) € G,(€) C Gy for all ee [0, 1], 
each \;(Ao) is joined to some \;(A;) = \;(A) by the continuous curve in 
Gr given by {;(A,):0 <€< 1}. For each e e [0, 1] we conclude that there 
are at least k eigenvalues of A, contained in G;,(e). But there cannot be 
more than & because the remaining n—k eigenvalues of Ag start outside 
the connected set G; and follow continuous curves that must remain with- 
in the complementary region Gf; because of continuity and connectivity 
(this is the intermediate value theorem for continuous functions), they 
cannot leap the void between Gf and G}. O 


The region G(A) in (6.1.2) is often called the Gersgorin region (for 
rows) of A; the individual discs in G(A) are called Gersgorin discs, and 


346 Location and perturbation of eigenvalues 


the boundaries of these discs are called Geršgorin circles, Since A and A’ 
have the same eigenvalues, one can obtain a GerSgorin disc theorem for 
columns by applying the GerSgorin disc theorem to A’ to obtain a region 
that contains the eigenvalues of A and is specified in terms of deleted 
absolute column sums 


n 


Cj(A) = 2 [a;;j| 
ivy 


6.1.3 Corollary. If A = [a;;]€ M,, then all the eigenvalues of A are 
located in the union of n discs 


ci 
U {zeC: |z—a,;|< Ci} =G(A") 
j=l 
Furthermore, if a union of k of these discs forms a connected region that 
is disjoint from all the remaining n—k discs, then there are precisely k 
eigenvalues of A in this region. 


(6.1.4) 


Exercise. Show that all the eigenvalues of A lie in the intersection of the 
regions (6.1.2) and (6.1.4), that is, in G(A)NG(A’), Illustrate with the 
3-by-3 matrix [a;;] with aj; =i//. 


Since all the eigenvalues of A are located in the two regions (6.1.2) and 
(6.1.4), the largest modulus eigenvalue of A is located there. The point in 
the ith disc in G(A) that is farthest from the origin has modulus 


Jaul-+Ri= X Jay 


so the largest of these values must be an upper bound for the largest 
modulus eigenvalue of A. Of course, a similar argument can be made for 
the absolute column sums. 


6.1.5 Corollary. If A = [a;;] € Mn, then 


n n 
p(4) smin} max X |a;|,max È jaul} 
i j=l j i= 
This result is no surprise, since it says that o(A) < || A|] and ||A’ |... 
(the maximum absolute row sum and maximum absolute column sum 
norms), and this inequality holds for any matrix norm. But it is interest- 
ing to have an essentially geometric derivation of this fact. 

Since S~'AS has the same eigenvalues as A whenever S is invertible, 


3 
A 
F 
3 
ia 


6.1 Gersgorin discs 347 


(a) (b) 


Figure 6.1.7 


we can apply the Geršgorin theorem to S~'AS; perhaps for some choice 
of S the bounds obtained may be sharper. A particularly convenient 
choice is $ = D =diag (p1, P2, ..., Pn) With all p; > 0. One calculates easily 
that D™'AD =[p;a;;/p;]. Applying the Gerggorin theorem to D~!AD 
and to its transpose yields the following. 


6.1.6 Corollary. Let A = [a;;] € M, and let pi, po,..., Dy be positive 
real numbers. Then all the eigenvalues of A lie in the region 
n 1 n 4 
U (zec: |z—a;j;| £ — 5 Palau = G(D AD) 
i=l Pi j=] 
j#i 


as well as in the region 
n n 1 -i i 
U (zec: [Z—-aj;| Sp; 5 — | @i;| = G[(D AD) ] 
j=l i=l Pi 
ixj 
The matrix A= lo | has eigenvalues 1 and 2. A straightforward 


application of the GerSgorin theorem gives a rather gross estimate for the 
eigenvalues (Fig. 6.1.7a), but the extra parameters in the last corollary 
give enough flexibility to obtain an arbitrarily good estimate of the eigen- 
values (Fig. 6.1.7b). 


Exercise. Consider the matrix 


7-16 8 
A=| —16 7 —8 
8 —8 —5 


348 Location and perturbation of eigenvalues 


Use the GerSgorin theorem to say as much as you can about the location 
of the eigenvalues of A and the spectral radius of A. Then consider 
D~'AD, where D = diag(p,, P2, D3), and see if you can obtain any im- 
provement in your location of the eigenvalues. Finally, compute the 
actual eigenvalues and comment on how well you did with the estimates. 


Exercise. Show that every eigenvalue of A lies in the set Np G(D~'AD) 
where the intersection is over all diagonal matrices with positive main 


diagonal entries. 


The idea of introducing free parameters can also be used to obtain a 
more general form of the estimates (6.1.5) for the spectral radius. 


6.1.8 Corollary. Let A = [a;;] E Mp. Then 


. 1 Z 
p(A)s min max — J p,la;;| 
Pires Py >O lsisn Pi j=l 
and 


n 


1 
p(A)s min max pÈ p, (ail 


Pi Dy>O lsjsn i=] Fi 


Exercise. Prove Corollary (6.1.8). 


Exercise. Let A= p AF where a, b, c, and d are strictly positive real ° 


cd 
numbers. 

(a) By direct calculation, find an explicit diagonal matrix D such 
that ||D~'AD|]..=minp||D~'AD]|,., where the minimum is 
taken over all diagonal matrices D with positive main diagonal 
entries. 

(b) Calculate ||D~'AD]|.. =r. 

(c) Calculate p(A) explicitly. 

(d) Note that r= p(A). 

We shall show later that if A is any n-by-7 positive matrix (or, more gen- 
erally, is irreducible and nonnegative), then the minimum over all D of 
the maximum row sum of D~'AD is always equal to the spectral radius. 
This is not the case in general. 


Exercise. Consider A = [ 1's >| and show that p(A)<min||D~'AD|j,, 


over all D=diag(p;, p2) with p; and p» positive. 


If one has some additional information about a matrix, which forces 
its eigenvalues to lie in (or not in) certain sets, then this information can 


6.1 GerSgorin discs 349 


be used along with the GerSgorin theorem to give an even more precise 
location for the eigenvalues. For example, if A is Hermitian, then the 
eigenvalues of A must all be real so they must lie in the set RNG(A) 
which is a finite union of closed real intervals. 


Exercise. What can be said about the location of the eigenvalues of a 
skew-Hermitian matrix? A unitary matrix? A real orthogonal matrix? 


. Since a matrix is invertible if and only if 0 is not an eigenvalue, it is of 
interest to develop conditions that exclude the origin from the region 
known to contain the eigenvalues. 


6.1.9 Definition. Let 4 = [a;;]€M,. The matrix A is said to be diag- 
onally dominant if 
{i 
laj] = È la;;| =R/ forall j= lL... A 
i= 


It is said to be strictly diagonally dominant if 


{i 
lal > > |a;,| = R/ forall f=1,...,7 
j=l 
jxi 
From the geometry of the situation it is apparent that 0 cannot lie in 
any closed Geršgorin disc if A is strictly diagonally dominant. Further- 
more, if all the main diagonal entries aj; are real and positive, then each 
disc actually lies in the open right half-plane; if A is Hermitian as well, 
then the eigenvalues must all be positive. We summarize these observa- 
tions in the following theorem, of which part (a) is known as the Levy- 
Desplanques theorem [see Corollary (5 .6.17)]. 


6:1-10 Theorem. Let A=[a;;]EM, be strictly diagonally dominant. 
en 


(a) A is invertible. 

(b) If all main diagonal entries of A are positive, then all the eigen- 
values of A have positive real part. 

(c) If Ais Hermitian and all main diagonal entries of A are positive, 
then all the eigenvalues of A are real and positive. 


. . I . ` 
Exercise. Consider [ 1 r] and p L i] to show that diagonal dominance 


alone is not sufficient to guarantee invertibility, and that strict diagonal 
dominance is not necessary for invertibility. 


350 Location and perturbation of eigenvalues 


By using the extra parameters in Corollary (6.1.6), the assumption of 
strict diagonal dominance as a sufficient condition for invertibility can be 


relaxed slightly. 


6.1.11 Theorem. Let A = [a;;] € M, have all diagonal entries nonzero 
and be diagonally dominant with la;;|> Rj for all but one value of 
i=l,...,n. Then A is invertible. 


Proof: The hypothesis is that for some k, |axx| = Rk and |a;;| > Rj for all 
ixk. In (6.1.6), let p;=1 for all ¿æ k and let py =1+e, €>0. Then 
i 
Pk 


a 1 

© pilai] =- Ri <laex| for any e>0 
j=l l+e 

jæk 

and 


i S pj\aij| = Rit elai] for all i¢k 
mw 
Fei 
But since R/ < |a;;| for all i#k, we can choose e > 0 small enough so that 
Ri +elaix|<|aji| for all i#k. By Corollary (6.1.6), the point z=0 is 
therefore excluded from G(D7'AD) and so A must be invertible. C 


The GerSgorin theorem and its variations give inclusion regions for the 


eigenvalues of A, which depend only on the main diagonal entries of A | 


and the absolute values of the off-diagonal entries. Using the fact that 
5~1AS has the same eigenvalues as A led us to (6.1 .6) and to the fact that 


the closed set 


(\G(D"'AD), D=diag(p),-..5Pn), all pi > (6.1.12) 
D 


contains all the eigenvalues of A € M,. We know that we could get even 
smaller inclusion regions for the eigenvalues if we were to admit more 
complicated similarities than diagonal ones, but if we restrict ourselves to 
just diagonal similarities and to the use of just the main diagonal entries 
and the absolute values of the off-diagonal entries, can we somehow do 
better than (6.1.12)? l 
The answer is no, for the following reason: Let z be any given point 
on the boundary of the set (6.1.12). Then R. Varga has shown that there 
exists a matrix B = [b;;]€ M, such that bii = Gi for all i=1,...,7 and 
|b;;| = |a;j| for all i, j=1,..., and such that z is an eigenvalue of B. 


Problems 
1. Consider the following iterative algorithm for solving the n-by-n 
system of linear equations Ax = y, where A and y are given: 


6.1 GerSgorin discs 351 


(i) Define B=I—A and rewrite the system as x = Bx +y. 
(ii) Choose an initial approximation x °? to the solution in any way 
you wish, 

(iii) For m=0,1,2,... calculate x” +P = Bx” +y. 

(iv) Hope that x”) + x (the solution) as m —> œ. 
(a) Denote by e“ =x" —x the error in the mth approximation to 
the solution, and show that e“™ =B’(x)—x). (b) Conclude that if 
p([—A) <1, then this algorithm works in the sense that x" +x as 
m — œ regardless of the choice of the initial approximation x. (c) Use 
the GerSgorin theorem to give a simple explicit condition on A that is 
sufficient for this algorithm to work. 


2. Show that Ns G(S~'AS) = o(A) if the intersection is taken over all 
nonsingular S. 


3. Use (6.1.5) to show that 


n n 
Idet A| < I ( 5 |) 
i=1\ j=l 
for any Ae Mn, with a similar inequality for the columns. Hint: If any 
row of A is zero there is nothing to prove. If all rows of A are nonzero, 
then let B be the matrix whose rows consist of the rows of A divided by 
the corresponding absolute row sums of A. Then p(B) <1 by (6.1.5), so 
|det B| <1. Notice that this says that 


idet A| <II lah 


where the vectors a; are the respective rows (or columns) of A. Is there 
such an inequality for other norms? Hint: See (7.8.2). 


4. In the text we derived Theorem (6.1.10a) - the Levy-Desplanques 
theorem ~ from the Gerggorin theorem (6.1.1). Show that the first part of 
(6.1.1) [the fact that the region (6.1.2) contains all the eigenvalues of A] 
follows from part (a) of (6.1.10). Hint: Apply (6.1.10a) to the matrix 
I-A. 


5. Suppose that A €e M, is a real matrix whose n GerSgorin discs are all 
mutually disjoint. Show that all the eigenvalues of A are real. More gen- 
erally, show that the same is true, and for the same reason, if a complex 
matrix A € M, has real main diagonal entries and its characteristic poly- 
nomial has only real coefficients. 


6. Show that if A = [a;;] € M, and if |a;;| > R/ for k different values of i, 
then k srank A. 


7. Suppose that Ae M, is idempotent (A? = A), but that A =I. Show 


352 Location and perturbation of eigenvalues 


that A cannot be strictly diagonally dominant 
dominant; see (6.2.25) and (6.2.27)]. 


8. Suppose that A €M, is strictly diagonally dominant, aul > Ri for 
all i=1,...,n. Show that |agą| > Ck for at least one value of k =1,.. 7. 


[or irreducibly diagonally 


[ap] E My is strictly diagonally dominant, and let D= 


o Suppose A= D is invertible and that p(/ —~D'A)< 


diag (a1, @22, ---» @nn). Show that 
1. Hint: Use Corollary (6.1.5). 


10. If A=[a;;]eM, and if R; = R/+|a;;| denotes the Sum of the abso- 
lute values of all the entries in the ith row of A, show tha 


2 Jail 
rank A= >» “R 

iz 
0 =0 in this sum. Hint: Multiplying all the ele- 
ro scalar does not change the rank, 
>0 and all R; are either zero or 1. 
n the unit disc and one must 


where we agree that 0/ 
ments in a row by the same nonze 
so it suffices to assume that all aj; an 
In this case, all the eigenvalues of A lie 1 


show that 
n 
rank A= J dj 
i=l 
Show that Sa,=tr A= Di È |\;| £ number of nonzero eigenvalues 
u 


of A srank A. 


11. If A= [a;]= [a a... an| € Mn, show that 
n |an? 
Az y 
rank A= È Tal 
where we agree that 0/0 =0 in the su 
that it suffices to consider the case in 
unit Euclidean length, that is, all |a;|2 = 
that 


which all the columns of A have 
1, and in this case one must show 


. 2 d 12 
rank A= 5 laj] = 2 lei a;| 
isl i= 
.,@n} is the standard orthonormal basis of C”. If A has 


where {@;, €2,-- .., Up E C” such 


rank k, show that there are k orthonormal vectors vi,» 
that Span{vi, -> Vz} = Span{a, -s an} Then 


k 
$ 
so efa,= d (vja) (erv) 
pen} 


k 
ai= > (vjaj)v;, 2 
j=l 


and 


m. Hint: As in Problem 10, show d 


6.2 Geršgorin discs - a closer look 


Dieta? <È |( 2ra) ($ erse) 


kon k 
=¥ Siew P= y 1 


353 


feliz] 


=k=rank A 


Further Readings. A discussion of the Gerggorin theorem with some nu- 
merical examples can be found in [Ste]. The original reference is S. Gerš- 
gorin, “Uber die Abgrenzung der Eigenwerte einer Matrix,” Izv. Akad. 
Nauk. S.S.S.R. 7 (1931), 749-754. There is a generalization of Gerggorin’s 
theorem (6.1.1) that gives inclusion regions for the spectrum of the gen- 
eralized eigenvalue problem Ax =) Bx and covers the case in which B is 
singular; see G. W. Stewart, “Gerschgorin Theory for the Generalized 
Eigenvalue Problem,” Math. Comput. 29 (1975), 600-606. For a proof 
that the region (6.1.12) has the optimality property stated in the last para- 
graph of the section, see R. Varga, “Minimal Gerschgorin Sets,” Pacific 
J. Math. 15 (1965), 719-729. 


6.2 GerSgorin discs - a closer look 
We have seen that strict diagonal dominance is sufficient for invertibility 
but that diagonal dominance is not. Consideration of some 2-by-2 ex- 


amples suggests the conjecture that diagonal dominance together with 
Strict inequality 


|a;;|> 5 |a;;| for at least one value of i=1,..., n (6.2.1) 


j=1 
J*I 


may be sufficient for invertibility. Unfortunately, this is not the case, as 
is shown by the example 


421 
oll (6.2.2) 
011 


However, there are useful conditions on a diagonally dominant matrix 
under which (6.2.1) is sufficient to guarantee invertibility, and they lead 
to some very interesting ideas in graph theory. The fundamental observa- 
tion is that if A is diagonally dominant, then 0 cannot be an interior point 
of any individual Gerggorin disc. 


354 Location and perturbation of eigenvalues 


Exercise. Show that a given point ) is not an interior point of any Gerš- 
gorin disc of A if and only if 


n 
|\\—a;;|2 Ri = 5 |a;;| for all j=1,....” (6.2.2a) 
j=l 


ji 
Show that every point \ on the boundary of G(A) satisfies these inequal- 


1 -1 1 NODO 
ities. Consider \ = 0 and A = [i id Sl i _i| to show that an interior 


point of G(A) can also satisfy the inequalities (6.2.2a). 


A careful analysis of the proof of Theorem (6.1.1) clarifies what hap- 
pens when an eigenvalue of A satisfies the inequalities (6.2.2a). 


6.2.3 Lemma. Let A = [a;;] € M, and let \ be an eigenvalue of A that 
satisfies the inequalities (6.2.2a). Let Ax =x, X= [x;] #0, and suppose 
p is an index such that |xp| = max) <j<n|X;|=|x|. #0. Then | 
(a) If kis any index such that |x,|=|x,|, then |\—ayx| = Rj; that is, 
the kth GerSgorin circle passes through d; and l 
(b) If |x,|=|x,| for some k=1,...,n and if a,j; #0 for some j #k, 
then |x;|=|x,| as well. 


Proof: Just as in the proof of the GerSgorin theorem, we have 


n 
(\—a;)xi = D ajx; foral i=1,...,7 
J= 


j#i 
and hence a 
n n 
|A—ai||xi]=| È ajx] s È layx;l= X layli] 
jai jzi jai (6.2.4) 


n 
= X layllxp| =R; |x 
Jri 


jek jek 
Since |x;,|= |x|. > 0, assertion (a) follows from the identity 


IA — arl [Xe = Ril Xe | 


Thus, if k is any index such that |x;,|= Xol we must have |A— ik Is Ree | 6.2.7 

isi ; [= so we mu -o 
But the hypothesis is that |A — a; | = R/ for al i=1,...,7,80V ' tu- | very pa ct 
ally h oa uality in both of the inequalities in (6.2.4) for i =k; that is, of distinct integers kı =D, Ka, Kzoe 
yee all of the matrix entries ax, kz, ak, kak 


— xX, = apil iX; = 5 ay; || xXx] = Ril Xx 1) i A 
Ih ale 2! ull j j=l , ~ pair 2,1 does not admit such a sequence of nonzero entries. The pair 1,2 
-~ does admit such a sequence, however. 


6.2 GerSgorin discs - a closer look 355 


Assertion (b) follows from the center identity in (t) 


n 


È lax; (1 Xe|—|2x;|) =0 


j=l 
j#k 


because every term in this sum must be nonnegative. O 


This lemma looks rather technical, but it has as an immediate conse- 
quence the following useful result and its corollary. 


6.2.5 Theorem. Let Ae M,, and let d be an eigenvalue of A that is 
a boundary point of G(A) or, more generally, satisfies the inequalities 
(6.2.2a). Suppose that all the entries of A are nonzero. Then 

(a) Every GerSgorin circle of A passes through \; and 

(b) If Ax=)x, x =[x;] #0, then |x;| =|x;| for all i, j=1,...,7. 


Exercise. Deduce Theorem (6.2.5) from Lemma (6.2.3). 


6.2.6 Corollary. Let A = [a;;] € M„, and suppose that all the entries of 
A are nonzero. If A is diagonally dominant and if |a,;|>R/ for at least 
one value of i =1,..., n, then A is invertible. 


Proof: If A were not invertible, then 0 would be an eigenvalue of A. Since 
Ais diagonally dominant, 0 cannot be an interior point of any GerSgorin 
disc and hence \ = 0 satisfies the inequalities (6.2.2a). The theorem says 
that every GerSgorin circle must pass through 0, but if |a;;| > Rj, then the 
ith circle cannot pass through 0. © 


The previous result is both useful and interesting, but we can do much 


, better (with regard to the assumption on zero entries in A) if we use more 
_ carefully the information in Lemma (6.2.3). 


Definition. A matrix A = [a; j] € M; is said to have property SC if 
for every pair of distinct integers p, q with 1 < p, q =n there isa sequence 
,Km-1,Km=q, Leamsa, such that 
m-ikm are nonzero. 

For example, the matrix (6.2.2) does not have property SC because the 


Using this notion and Lemma (6.2.3), we can obtain the following im- 


i provement on (6.2.5). 


356 Location and perturbation of eigenvalues 


6.2.8 Better theorem. Let A = [a;;] E€ Mp, and suppose that à is an 
eigenvalue of A that is a boundary point of G(A), or, more generally, 
satisfies the inequalities (6.2.2a). If A has property SC, then 


(a) Every GerSgorin circle passes through à; and 
(b) If Ax=hx and x =[x;] #0, then |x;|=|x;,| for all i j=1,..., 2. 


Proof: Let Ax =x with |x;|<|xp|=|x|.>0 for all i=1,...,n. Then 
|A—dpp|=R, by Lemma (6.2.3). Let q be any other index, 1 £q £n, 
q # p. Since A has property SC, there is a sequence of distinct indices k; = 
P, kx, K3, ...,km = q such that all of the matrix entries ak, 49, +++) Ukm—jkm 
are nonzero. Since akk; = apk, #0, we see by assertion (b) of (6.2.3) that 
\xp| =|x«,|. But then akk, #0 and so |xx,| =|xx,|=|xp|. Proceeding in 
this way we conclude that |xz,|=|x,| for all /=1,...,m and hence [by 
(6.2.3)(a)], |A—@kykm| = |N—4qq| = RG; that is, the qth GerSgorin circle 
passes through \ and |x,|=|x,|. But since q was an arbitrary index, we 
conclude that every GerSgorin circle passes through \ and that all |x;| = 
Ixp|,i=l,...,n. O 


Just as in (6.2.6), we can deduce a useful sufficient condition for 
invertibility from this result. 


6.2.9 Better corollary. Let A=[a;;]€M, and suppose that A has 
property SC. If A is diagonally dominant and if |a;;| > R/ for at least one 
value of i=1,...,n, then A is invertible. 


Exercise. Deduce (6.2.9) from (6.2.8). 
Exercise. Show that the matrix (6.2.2) does not have property SC. 


What is this strange property SC? Notice that it has to do only with 
the locations of the off-diagonal nonzero entries of A - the main diagonal 
entries and the precise values of the off-diagonal entries are irrelevant. 
Because of this observation, we define two matrices related to A. 


6.2.10 Definition. If A=[a;;]€ Mm,n we set |A| = []a;|] and M(A)= 
[wiz], where pij=1 if aj; 40 and p;;=0 if aj=0. The matrix M(A) is 
called the indicator matrix of A. 


Exercise. Show that a matrix A € M, has property SC if and only if either 
(and hence both) |A| or M(A) has property SC. 


6.2 GerSgorin discs - a closer look 357 


The concept of a sequence of nonzero entries of A that arises in the 
statement of property SC can be summarized visually in terms of certain 
paths in a graph associated with A. 


6.2.11 Definition. The directed graph of Ae M,, denoted by T (A), is 
the directed graph on n nodes P}, P3,..., P„ such that there is a directed 
arc in T (A) from P; to P; if and only if a,j; #0 (pj, #0). 


Examples 


421 
A=|0 1 1l; 
011 


6.2.12 Definition. A directed path y ina graph T is a sequence of arcs 
Pi Pi, » Pin Pi; »Pi,Pi,,... in T. The ordered list of nodes in the directed 
path y 18 Pi, Piy,.... The length of a directed path is the number of suc- 
cessive arcs in the directed path if this number is finite; otherwise, the 
directed path is said to have infinite length. A cycle is a directed path that 
begins and ends at the same node; this node occurs exactly twice in the 


358 Location and perturbation of eigenvalues 
ordered list of nodes in the path, and no other node occurs more than 


once in the list; some authors would call this a simple directed cycle, A 
cycle of length 1 is called a /oop or trivial cycle. 


6.2.13 Definition. A directed graph T is strongly connected if between 
every pair of distinct nodes P;, P; in T there is a directed path of finite 
length that begins at P; and ends at P}. 


6.2.14 Theorem. Let A € M„. Then A has property SC if and only if the 
directed graph T (A) is strongly connected. 


Exercise. Prove the theorem. 


Exercise. Show that T is strongly connected if it has the property that 
every pair of nodes belongs to at least one cycle, but that the converse is 
not correct. Hint; 


0 1 0 
1 0 1 
0 1 0 


There may be more than one directed path between two nodes of a 


directed graph, but two such paths with different lengths may not be essen- E 


tially different; one may contain repetitions of one or more subpaths, It 


is clear that if one ever visits a given node twice in going along a directed 1 
path, then the directed path may be shortened (and the end points willbe | 


unaffected) by deleting all the intermediate arcs between the first and 
second visits to the node (the subgraph deleted is, or contains, a cycle). 


6.2.15 Observation. Let T be a directed graph on n nodes. If there is a I 
directed path in T between two given nodes, then between these nodes : 


there is a directed path that has length not greater than n—1. 


How can one tell if a given matrix A has property SC? This is equiva- E 
lent to checking whether T (4) is strongly connected. If n is not large or if , 
M(A) has a special structure, then one can just inspect (A) and trace 1 
out paths between all possible pairs of nodes. However, this is not prac- l 


tical in general, so we need some explicit computational method. 


6.2.16 Theorem. Let A € M, be given, and let P; and P; be given nodes i 


of r(A). There exists a directed path of length m in T (A) from P; to Pj 
if and only if (|A|’”);;#0 or, equivalently, [M(4)”];; #0. 


6.2 GerSgorin discs - a closer look 359 


Proof: We proceed by induction. For m =1 the assertion is trivial. For 
m=2 we compute 


n 


[APIs X NA HAlu= E lailla] 


= k= 


so that [A| lj 0 if and only if for at least one value of k, a; and ap; 
are both nonzero. But this is the case if and only if there exists a path of 
length 2 in T (A) from P; to P;. In general, suppose the assertion has 
been proved for m =q. Then 


Aly = > (A/V lll È [A| lik lak; #0 


if and only if for at least one value of k, [|A|%]j, and lag j| are both non- 
zero. This is equivalent to having a path from P; to P, of length g and 
one from P, to P; of length 1, and this is the case if and only if there is 
a path from P; to P; of length g+1. The same argument works for 
M(A). O 


6.2.17 Definition. Let A=[a;;]eM,. We say that A= 0 (A is nonnega- 
tive) if all its entries a;j are real and nonnegative. We say that A >0 (A is 
positive) if all its entries a;; are real and positive. 


6.2.18 Corollary. Let Ae M,. Then |A|” >0 if and only if from each 
node P; to each node P; in T (A) there is a directed path in T(A) of 
length exactly m. The same is true for M(A)’”. 


6.2.19 Corollary. Let AeM,. Then A has property SC if and only if 
([+|A]|)"~'>0 or, equivalently, if [Z+ M(A)]"~'>0. 


Proof: (I+|A|)""'=I+(n—1)|A|+("7')|AP +--+ +(273) A >o if 
and only if for each pair (i, j) of nodes with i # j at least one of the terms 
|A|, |Al*,...,|A]""! has a positive (i, j) entry. But Theorem (6.2.16) 
says this happens if and only if there is some directed path in r(A) from 
P; to P;. This is equivalent to T (A) being strongly connected, which is 
equivalent to A having property SC. O 


Exercise. Prove the assertion in Corollary (6.2.19) involving M(A). 


6.2.20 Corollary. There is a path in T (A) from P; to Pj, '#J, if and 
only if [(7+|a|)"~"],; #0. 


360 Location and perturbation of eigenvalues 


Exercise. Use Corollary (6.2.19) to give an explicit computational test for 
property SC that involves only about log; (n—1) matrix multiplications 
instead of n—2 matrix multiplications. Hint: Consider (J+|A]|)’, the 
square of this, and so on. 


Before leaving this subject, we introduce one more equivalent charac- 
terization of property SC. It is based on the fact that strong connectivity 
of I'(A) is just a topological property of T (4) -it has nothing to do with 
the labeling assigned to the nodes of T (A). If we permute the labels of 
the nodes, the graph stays either strongly connected or not strongly con- 
nected. Notice that if we interchange the /th and jth rows of A as well as 
the ith and jth columns, this has the effect on T (A) of interchanging the 
labels on nodes P; and P}, and vice versa. 

Recall that a permutation matrix P is a square matrix, all of whose 
entries are 0 or 1; in each row and column of P there is precisely one 1. 
Clearly, such a matrix is unitary, hence orthogonal, so P'’=P™'. The 
simplest permutation matrix P has p;;=p;;=1 for some fixed choice of 
i, j and has all other nondiagonal entries 0. The similarity PAP then has 
the effect of interchanging the ith and jth columns of A as well as inter- 
changing the ith and jth rows of A. Any permutation of the rows and 
columns of A can be obtained as a succession of such interchanges, and 
any permutation matrix is a finite product of such simple permutation 
matrices. Thus, if P is a permutation matrix, the similarity PAP is ob- 
tained from A by a suitable permutation of the rows and columns of A. It 
is important to know whether some permutation of the rows and columns 
of A can be found that brings A into the following special block form. 


6.2.21 Definition. A matrix A € M, is said to be reducible if either 
(a) n=1 and A=0; or 
(b) n22, there is a permutation matrix Pe Mn, and there is some 
integer r with 1<r<n—1 such that 
B p 


PAP = 
P=(5 p 


where Be M,, De Mn-,, CE M, n-r, and 0EM,-_,,, iS a zero | 


matrix. 


Note that we do not insist that the blocks B, C, and D have nonzero 
entries, but only that we should be able to get an (n —r)-by-r block of 
0 entries in the indicated position by some sequence of row and column 
interchanges. If |A|>0, clearly A is not reducible, and if A is reducible, 
it must have at least (1—1) 0 entries. 


6.2 GerSgorin discs - a closer look 361 
Remark: Suppose we want to solve the system of linear equations 
Ax = y, and suppose that A is reducible. Then if we write A= PTAP = 
[o AF we have Ax=PAP'x=y, or A(P'’x)=P'y. Set P’x=%= 
[z":¢"]" (unknown) and P'y=ĵf=[w":w"]" (known), where z,we 
C’ and §,weC"~". Then the system of equations to be solved is equiva- 
lent to A¥ = f= [o 5] [z] = [“]. that is, to 
Bz+Cl=w 
Dé=a 
If we solve Df = w first for £, then use ¢ in the first equation and solve 
Bz=w-—Cf¢ for z, we have reduced the original problem to two smaller 


problems that should, in principle, be easier to solve. It is this observa- 
tion that motivates the term reducible. 


6.2.22 Definition. A matrix A € M, is said to be irreducible if it is not 
reducible. 


6.2.23 Theorem. A matrix A e M, is irreducible if and only if 
(1+|A])""'>0 
or, equivalently, if [7+ M(A)]"~'>0. 


Proof: We shall actually prove that A is reducible if and only if (7+ |A|)"~! 
has at least one 0 entry. Suppose first that A is reducible and that for 
some permutation matrix P we have 

BC 
0 D 
where B, C, 0, and D are block matrices as in Definition (6.2.21). Notice 
that |A|=|PAP’|=P|A |P" since the effect of P is only to permute rows 
and columns; also notice that |A|*,|A|?,...,|A/"~! all have the same 
(n—r)-by-r block of 0’s in the lower left corner as A. Thus 


(+|A])"~! = (1+ P|A|P1)""! = (PUI+|A|]P7)""! = PH+ "PT 
sp|rra-o]āla( "7 aR ("1 aie t|pr 


n—-| 


A=P| |P7= PAP" 


and all of the terms in the square brackets have an (n—r)-by-r block of 
0’s in the lower left corner. Thus, (+|A])”~! is reducible and hence it 
cannot have all nonzero entries. 


362 Location and perturbation of eigenvalues 

Conversely, suppose for some p # q that the (p,q) entry of (Z+ Japra! 
is 0. Then we know that there is no directed path in P(A) from P, to Pa. 
Define the set of nodes 


S= {P;: Pi= Pq OF there is a path in T (A) from P; to Pal 


and let S, contain all nodes of (A) that are not in $. Notice that $ U $2 = 
{P\,..., Pn} and P,€ES\# ©, so Sy # (Pi, ..-, Pr}. If there were a path 
from some node P; of S, to some node P; of Sı, then (by definition of S;) 
there would be a path from P; to P, and so P; would already be in $i 
Thus, there can be no paths from any node of S, to any node of Sı. Now 
relabel the nodes so that Sı = (P,,...,P,} and $= (Bri Pn) and 
notice that 


B C 
0 Ai BeM,, 0e Mn-r,r 


so that A is reducible. The argument for [/ +M(A)]"~'>0 is just the 


A=P'AP= | 


same. O 


Let us summarize: 


6.2.24 Theorem. Let A € Mn. The following are equivalent: 


(a) Ais irreducible; 

(œ) (1+|A])"' > 0; 

(c) [Z+M(A)]"7'>0; 

(d) I(A)is strongly connected; and 
(e) A has property SC. 


6.2.25 Definition. Let Ae Mn. We say that A is irreducibly diagonally 
dominant if 

(a) Ais irreducible; 

(b) Ais diagonally dominant, that is, |a;;| = R/(A) for all i=1,..., 7; 


and 
(c) For at least one value of i we have |a;;| > RI(A). 


Exercise. Show by example that a matrix can be irreducible and diag 
onally dominant without being irreducibly diagonally dominant. 

In our present language, we can rephrase our “better theorem” (6.2.8 
and its corollary as follows: 


SOLON TON 


6.2 GerSgorin discs - a closer look 363 


eee im orem. Ler ae M, be irreducible. A boundary point à of the 
n or, more generally, a point ) satisfyi , 

equalities (6.2.2a)] can be an ei , nt A satisfying the in- 
cle passes through \. genvalue of A only if every Geršgorin cir- 


6.2.27 Corollary (Taussk = i i 
even ( y). Let A = [a;;] € M, be irreducibly diagonally 

(a) A is invertible; 

o f al y >0, then Re(à;) >0 for all eigenvalues \; of A; and 
H is Hermitian (or, more generally, if A has only real eigen- 
va ues), and if all main diagonal entries of A are strictly positive 

en all the eigenvalues of A are strictly positive. ' 


6.2.28 Corollary. Let A €M, i ; 
one value of i that e M, be irreducible, and suppose for at least 


A 
Ri = È las! <|/All. 
that is, not all absolute row sums i 
; equal the maximum absolut 
Then p(A)<|/Alj... More generally, if p,,...,p, > 0, if ae 
D = diag(p1, P2, ---s Pn) 
and if R;|(D~'AD =i 
Di è P )<||D~“AD]].. for at least one value of i, then p(A) < 


Proof: wit always have the bound p(A) s [A|] and we have equality if 

Thee te pere is some eigenvalue ) of A with |\|=||Al]... But then by 

sumption 6-2. ) every Geršgorin circle must pass through \. Our as- 

samptio some R; <||A|j.. prevents this, however. The second asser- 
n follows from the same argument applied to DIAD. O 


Problems 
1. Show that an irreducible matrix cannot have a 0 row or column 


2. Show by an example th ap: vues 
(6.2.28) is necessary, ple that the hypothesis of irreducibility in Corollary 


3. Suppose that A = [a;] E M, i i 
at A= [a;j n» that d is an eigenvalue of |A|=[|a;; 
end mat wee is a vector x = [x;] e R” with all x; > 0 such that l $ au 
= diag (x), X2, ...,X,). Show that every GerSgorin circle of D7! |A ID 


364 Location and perturbation of eigenvalues 


passes through à. Draw a picture. What can you say about the absolute 
row sums of D~'AD? 

4. It will be proved in Chapter 8 that a square matrix with positive 
entries always has a positive eigenvalue and an associated positive eigen- 
vector. Use this fact and the preceding problem to show that 


p(A) < p(|Al) 
whenever all entries of A are nonzero. Argue by continuity that the last 
requirement may be dropped, so 

p(A) = p(|A|) 
5. Use Corollary (6.2.28) to show that Cauchy’s bound (5.6.40) on the 
zeroes of the polynomial 


for all AeM, 


-i 
D(Z) HZ Hag 1Z" + HATHA, ay #0 


can be improved slightly to 
|Z| < max{ļao], |a| +1, |a2|+1, =+, lani] +1} 
under the assumption that it is not the case that 
lao] =|a|+1=|@|+1= + = [ani] +1 


Hint: Show that the companion matrix C(p) in (5 6.39) is irreducible f 
ao#0. What improvements can be made in Montel’s bound © . ) 
Carmichael and Mason’s bound (5.6.42), Montel’s bound (5.6.43), an 


Kojima’s bound (5.6.45)? 
Further Reading. For a discussion of the Levy-Desplanques theorem and 


many references to the literature, see O. Taussky, “A Recurring Theorem 
on Determinants,” Amer. Math. Monthly 56 (1949), 672-676. 


6.3 Perturbation theorems 


Let D=diag(\;, \2,---:4n) E Mn, let E= le;;] E€Mn, and consider the 
perturbed matrix D+ E. By Theorem (6.1.1) the eigenvalues of D +E are 
contained in the discs 


hi 
[ze C: Jedi e] = RUE) = 2 eul}; i=l,... 
j= 


i#i 


„r 


which are contained in the discs 


6.3 Perturbation theorems 


[ze C: z-N|= R(E) = Jn 


A 
5 lel. i=1,... 
j=l 
Thus, if Á is an eigenvalue of D+ £, there is some eigenvalue d, Of D such 
that |\A—d,;| = JE llo. Unfortunately, this simple estimate does not extend 
to the general (nondiagonal) case, but we can use it to give a simple 


bound in the case in which the matrix is diagonalizable. 


6.3.1 Observation. Let Ac M, be diagonalizable with A= SAS~' and 
A=diag(\),...,A,). Let Ee M,. If Å is an eigenvalue of A+, then 
there is some eigenvalue \; of A for which 


[Ail E ISl Sc IE fo = kao S) HE Io 


where k.(*) denotes the condition number with respect to the matrix 
norm |[+[}... 


Proof: Since A+E and S~'(A+E)S=A+S~'ES have the same eigen- 
values and since A is diagonal, the previous argument shows that there 
is some \; for which |\—),| <||S~'ES|]... The stated inequality follows 
since ||+||.. isa matrix norm. O 


By a slight change in technique, we can generalize this result to matrix 
norms other than the maximum row sum norm. The key hypothesis on 
the matrix norm is satisfied for all induced matrix norms that are induced 
by a monotone or absolute vector norm; see (5.6.37). 


6.3.2 Theorem. Let A e M, be diagonalizable with A =SAS~' and A= 
diag (ài, ..., An). Let E € M, and let |||] be a matrix norm such that |D] = 
max; <;<n|d| for all diagonal matrices D=diag(d,,...,d,)@M,. If X 
is an eigenvalue of A+£, then there is some eigenvalue \; of A for 
which 


=Ni s SHISTE = «(S) E] (6.3.3) 


where «(+) is the condition number with respect to the matrix norm Iel. 


Proof: As in the previous result, it suffices to consider the eigenvalues 
of S'(A+E)S=A+S7'ES. If Å is an eigenvalue of A+S~'ES, then 
ÁI—A -STES is singular. If }\J—A is singular, then Å =}, for some j 
and the bound (6.3.3) is trivially satisfied. Suppose, however, that \J—A 
is nonsingular. In this case, the matrix 


(AJ—A)7'(X7-A—S~'ES) =1~(X7—A)~'S>' Bs 


366 Location and perturbation of eigenvalues 


is singular, and hence by (5.6.16) it must be that |OT- A)7'S7'ES|j 21. 
Thus, because of the assumption made about the behavior of the matrix 
norm |fe] on diagonal matrices, we have 


1s [Q-A STES] < ISESI IÓ- 


E {yj t= STESI 
=S] Es] max A-N = hoa 


and hence a 
min |Á); s STES] s SSE =«(S) Jz]. 0 


Exercise. Show that the assumption of the theorem concerning the 
behavior of the matrix norm on diagonal matrices is satisfied for all of 
the following norms: ||«|l2, ello, leili. Give an example of at least one 
other matrix norm that satisfies the assumption. 


Exercise. Give an example of a matrix norm that does not satisfy the 
assumption of the theorem. 


Exercise. Show that || U || =1 for any unitary matrix U. 


Although the condition number «(+) arose previously in (5.8) in the 
context of error bounds for solutions of linear equations, we see that it 
now arises in (6.3.3) as an upper bound on the ratio of errors 


a 


IA Ail 
IEI 


in computing eigenvalues of a diagonalizable matrix. If «(S) is small 
(near 1), then small perturbations in the data may perturb the eigen- 
values, but changes in the eigenvalues will be bounded by a term of the 
same order as the changes in the data. If x(S) is very large, however, then 
small perturbations in the data may result in relatively large changes in 
the eigenvalues. 

Unlike the situation in (5.8) for solutions of linear equations, it is not 
x(A) that is of importance here, but «(S), where A = SAST" and S isa 
matrix whose columns are eigenvectors of A. The condition number with 
respect to the spectral norm has the geometrical interpretation that 
x(S) =cot(@/2), where @ is the least angle between Sx and Sy as x and 
y range over all possible orthogonal nonzero vectors [See Example 
(7.4.26)]. Thus, independent of the condition number of A, if a pair of 
linearly independent eigenvectors of A is nearly parallel, then two columns 


= xK(S) 


6.3 Perturbation theorems 367 


of S (say, columns p and q for p#q) may be nearly parallel, and hence 
the angle between Se, and Se, may be small even though the unit basis 
vectors ep and eg are orthogonal. In this event, the spectral condition 
number «(S) will be large, and the problem of determining the eigen- 
values of A may be ill conditioned. 

If S is unitary (or nearly unitary), however, then S will take pairs of 
orthogonal vectors into orthogonal (or nearly orthogonal) vectors and 
the spectral condition number of S will be small (equal to 1, in fact, if S is 
unitary). In this case, the problem of determining the eigenvalues of A 
must be well conditioned. Of course, a matrix can be (exactly) unitarily 
diagonalized if and only if it is normal, so (6.3.2) yields a perturbation 
theorem for the full class of normal (in particular, Hermitian or real 
symmetric) matrices, which is of the same simple form as our original 
observation about diagonal matrices. Normal matrices are perfectly con- 
ditioned with respect to eigenvalue computations, 


6.3.4 Corollary. Let Ac M, be a normal matrix with eigenvalues 
Ane Am and let Ee M,. If X is an eigenvalue of A+E, then there is 
some eigenvalue \; of A for which |\—X,| < |E l]. 


Notice that neither the perturbation matrix E nor the perturbed matrix 
A+E need be normal. Corollary (6.3.4) is most often applied in the case 
of a real symmetric matrix A. 


Exercise. Provide the details for a proof of Corollary (6.3.4). 


Exercise. Weyl’s theorem (4.3.1) can be used to give a better bound than 
the bound in (6.3.4) if one knows that both A and £ are Hermitian. If 
A, E €e M, are Hermitian, if M` << --- <},, are the ordered eigenvalues 
of A, if shs s Ån are the ordered eigenvalues of A+, and if 
M(E) S -+ =), (EZ) are the ordered eigenvalues of £, use the inequalities 
(4.3.2) to show that 


A(E)Sde—-deSd,(E) forall k=1,2,...,7n 
and that 
[Ace] S p(E) = JE | 


Explain why this is a better bound than (6.3.4). What information does 
this give if all the eigenvalues of E are known to be nonnegative? 


It is not uncommon in numerical applications to have a situation in 
which both the original matrix A and the perturbing matrix E are real 


368 Location and perturbation of eigenvalues 6.3 Perturbation theorems 


ever U is unitary, and i i M 

: if we modify o 

ven Ys y our extremum problem t i 

doub y rochasic matrices, we shall gain the advantage of having i 

patr mun Prot em over a convex compact set whose structure is kno H 
ver this larger domain is of course potentially larger: 


max {Re tr(UAU*A*): U is unitary} 


and symmetric. In this case, and in the more general situation in which 
both A and A+E£ are normal, there is a comprehensive bound available | 
on the perturbations to all the eigenvalues. 


6.3.5 Theorem (Hoffman and Wielandt). Let A, E € M,,, assume that A ; 
and A+ E are both normal, let (Ais -9 An} be the eigenvalues of A in some | 


given order, and let {\y,.--»An} be the eigenvalues of A +E in some order. ; 
Then there exists a permutation a(i) of the integers 1,2,...,” such that | 


n 1/2 
È Kaw MP | <|El2 (6.3.6) 


{i 
=maxf 5 |uij|? Re(X;4,): U is unitary} 


Lj= 


i j=l 


< ye 2 . 
maxf È cj Re(h;h,): C is doubly stochastic} 


But the functi re ar 
ex set, so con to be maximized is a linear function on a compact con- 
see Appendix B and not shat. at an extreme point of the convex set 
he extreme points of e that a linear function is a convex function 
s of the set of doubly stochastic matrices are the ee 


utation matrices by Bi 
y Birkhoff’s theorem 
ermutation matrix P e M,, such that (8.7.1), and hence there is a 


Proof: Let A = diag(A1, -> An)» let A=diag(\i,--> Ñn), let Ve Mn be 
unitary matrix such that A = VAV*, and let We M, be a unitary matr 
such that A +E = WAW*. Then because the Frobenius norm is unitari 
invariant, we have 


JE =(4+E)-Alļż 

= |WAW*-VAV*|3 
—|V*wAw*v—-Al3 
=|ZAZ*— Aj} 
=tr(ZAZ*—A)(ZAZ*—A)* 
—tr(AA*+ AA*)—tr(ZAZ*A* + AZA*Z* ) 


n 
ax F K4):Ci 
PE Re(à; A;): C is doubly stochastic} = Re tr(PAP’A*) 


in , A 
ce a permutation matrix is unitary, we also have 


max {Re tr(UAU*A*): U is unitary} = Re tr(PÂPTA*) 

f Pe; =e for i=1,2,...,n, then 

= 5 (Ai? +A?) -2 Re tr(ZAZ*A*) 
j=] 


where we have set Z = V*W. This representation makes it clear that 


Re tr(PAP’A*) = 5 Re(Å u) Xi) 
i=] ' 


nd (6.3.7) says that 


n 


25 7 č 
JE= X Aail? + | di]? —2 Reou d) J 


n 

JEl= D (A? +IA[7) -2 max {Re tr(UAU*A*): U is unitary} (6.3.7 

j=l 

and we shall show that the exact value of this lower bound is the asserte 

bound (6.3.6). If U = [u,;] E€ Mn, one computes easily that & MoN 2 - 
n 

Retr(UÂU*A*)= È lu; Re(Aid,) 


Theorem (6.3.5 : 
2, ( ) says that there is a strong global stability to the set of 


eigenvalues of a normal i i 

and we are interested in the maximum value of this expression as of the eigenvalues will pull the st ted in ual ty. Not every arrange: 
ranges over the compact set of all n-by-n unitary matrices. If we se Ment will do, and indeed there is at least on arrangement for which ake 
Cj = |u, | and let C=[c¢, ,), the matrix C € M,, will be a matrix with no inequality in (6.3.6) is reversed (see Problem: nat the end of this section) 
negative entries and all of its row and column sums will be exactly + But in the important special case of Hert tian matrices, the naturai 
(because UU* = U*U = I). Thus, C is a doubly stochastic matrix whe ordering of the eigenvalues will do sian matrices, the nanara 


370 Location and perturbation of eigenvalues 


6.3.8 Corollary. Let A,EeM,, assume that A is Hermitian and that E 
A+E is normal, let {ħi ..., An} be the eigenvalues of A arranged ine 


Kas 


increasing order (M SM2 S SAn), and let (Aie An} be the eigenff 


A 


values of A + Æ, ordered so that Re Å; s Re ñz £ + < Re àn. Then 


Pius "seh 


Proof: By the theorem, there is some permutation ø of the given orde 


(increasing real parts) for the eigenvalues of A+£ for which 


È row N E =|El2 


If the eigenvalues of A+ £ in the list Raa TOF Aaa are already in increa 


ing order of their real parts, there is nothing to prove. If not, there ar i 
two successive eigenvalues in the list that are not ordered in this way, saf 


Re Xa) > Re ogen for some k such that 1sk<n 
But since 
Aarel + [Rockey els Renie + Aol 
+2(Ak— daa) (Re Racket) — Re Aot) 
and since \,—x41 £0 by assumption, we see that 


ls dal? + Woke deal’ = IRon A| + lAo Aal 


Thus, the two eigenvalues Notk) and Nock 4) can be interchanged witho 


increasing the sum of squared differences. By a finite sequence of such 
doin) can be transforme 


into the list Ñi, h2,---)4n in which the real parts are increasing and thy 


interchanges, the list of eigenvalues Agq1),---> 


asserted bound holds. OU 
In practice, the most common application of this corollary is to the ca 


in which both A and A +E are Hermitian, or even real and symmetrif 
© must have y=aUz for some a #0. But x =|x|2Ue,, so it must be that 


Exercise. Show that if A, Be M, are Hermitian and if their eigenvalug y*x#0. O 


are both arranged in increasing or decreasing order, then 


i= 


n 1/2 
( 2 [N(A) AB) ) <|A-—B|2 


Exercise. Show that the result in Theorem (6.3.5) need not be true if ng 
T Ahas a uniquely (up to a scalar factor a, |æ] =1) determined normalized 
| right \ eigenvector x and a uniquely determined left à eigenvector y, which 


ee 


both of A and B=A+E are normal. Hint: Let A= i T B= [ 
and show that 


6.3 Perturbation theorems 371 


2 
E P(A) —d(B)]° = 16 


i=] 


for any ordering of the eigenvalues. 


~ If A is not diagonalizable, there are no bounds known that are as 


simple as those in Theorem (6.3.2). It is possible, however, to derive an 
explicit formula which expresses how the algebraically simple eigen- 
values (algebraic multiplicity equal to 1) of a matrix vary when the entries 


E are perturbed. We require first a lemma on nonorthogonality of the left 

and right eigenvectors associated with a simple eigenvalue. 

(6.3.9 © 

6.3.10 Lemma. If A EM), if à is an algebraically simple eigenvalue of 
A, and if x and y are right and left eigenvectors, respectively, corre- 


sponding to the eigenvalue \ of A, then y*x #0. 


Proof: If Ax =)hx, x #0, then we may use the procedure employed in the 


_ proof of the Schur triangularization theorem (2.3.1) to construct a uni- 


tary matrix U whose first column is x/ |x] and for which 


‘Since A is a simple eigenvalue of A, it cannot be an eigenvalue of B. The 
unit basis vector e is the \ eigenvector of U*AU. Now consider 


eu 1 


(U*AU)* = U*A*U = pee 
x 


and suppose that U*A*Uz =z with z #0. If z* = [0i¢*], then #0 and £ 


is a à eigenvector of B*. But then \ would be an eigenvalue of B, which is 
excluded by assumption. We conclude that z cannot have a zero first 
component; that is, z*e, #0. But then (Uz)*(Ue,) = z*e, = 0, and the vec- 


tors Uz and Ue, are left and right \ eigenvectors of A. Since the left 


and right à eigenspaces of A are one-dimensional by assumption, we 


| Exercise. Consider A = [ ; i and show that the lemma is false if “alge- 
| braically simple” is omitted from the hypotheses. 


Now suppose that is an algebraically simple eigenvalue of A. Then 


372 Location and perturbation of eigenvalues 


is normalized by the relation y*x = 1. If we consider a differentiable para- f 
meterization A(t) such that A(O)=A [e.g., A(t)=A+CE for a fixedf 


perturbation matrix £], then for all sufficiently small ¢ there is a uniquely 


determined simple eigenvalue A(t) of A(t) such that (0) =). There isf 
also a right A(t) eigenvector x(t), which is uniquely (up to a factor a asf 


before) determined by the condition x*(t)x(t) =1, and a left A(t) eigen- 
vector y(t), which is uniquely determined by the condition y*(t)x(t) =1. 
If we differentiate this last normalization condition, we obtain the 

identity 
y(t) x(t) +y*(t)x'(t) =0 


Since A(t)x(t) = \(t)x(t) for all small ¢, we also have the identity 
y*(t)A(t)x(t) = A(t) y*(t) x(t) = A(t). If we differentiate this identity, we 


obtain 
NL) = YAH) + VLA (D1) x(t) +y A) (E) 


But since A(t)x(t) = A(t) x(t) and y*(t)A(t) = A(t) y*(t), this becomes $ 


MD SND HIYOL ADA (X(t) = VO ATO) XE) 


We have used the identity (6.3.11). At ¢=0 this is just the identity \’(0)=§ 


y*A'(0)x, subject to the normalizations x*x =1 and y*x =1. If x and y 
are right and left \ eigenvectors that are not necessarily normalized in this 


way, we can replace x by x/(x*x)'/? and y by (x*x)!/?y/x*y to obtain they 
independent eigenvector for any 6#0, whereas A itself has two inde- 


general identity \’(0),y*x = y*A’(0)x. We have proved the following result 
for a matrix A, which need not be diagonalizable. 


6.3.12 Theorem. Let A(t) eM, be differentiable at £ = 0. Assume that À 


is an algebraically simple eigenvalue of A(0) and that NE) is an eigen- 

value of A(t), for small ¢, such that \(0) =). Let x be a right à eigen- 

vector of A and let y be a left X eigenvector of A. Then 

ray JA (0)x 

\(0) = yx 

(under the assumptions of the theorem) that 
dy _ yrEx 
dt  y*x 


at ¢=0 


Exercise. Under the assumptioins of the theorem, show that 
OA 
ða;  y*x 


6.3 Perturbation theorems 373 


for any i, j. This formula shows how } varies with respect to changes in 
any element of A. Hint: Let E=E,,, the n-by-n matrix whose only non- 
zero entry is a one in the /, / position. 


Exercise. Consider the matrix A= È : ] and the eigenvalue \=1, 


0 t+e 
which is simple if e #0. Compute 0\/da;, explicitly for all four pairs i, j. 
How do these variations behave as e +0? Conclude that an eigenvalue N 
can be very sensitive to certain perturbations in A if x and y are nearly 


E orthogonal. 
(6.3.1)E 


In contrast to the situation for eigenvalues, the eigenvectors of even a 


diagonalizable matrix may suffer radical changes with only small per- 


turbations in the entries of the matrix. For example, if A= f a and 


E E= F a for «,6#0, then the eigenvalues of A+ FE are \=1 and I+e, 
and the respective normalized eigenvectors are 


1 —6 1 
ey ‘| ang P 


By choosing the ratio of e to ô appropriately, the first eigenvector can be 
chosen to point in any direction whatsoever, for e and 6 both arbitrarily 


small. 


If we set e =0, then the perturbed matrix A+ E = i i has only one 


pendent eigenvectors. 
All our estimates so far have been a priori bounds on the perturba- 
tions induced in the eigenvalues; they do not involve the computed eigen- 


values or eigenvectors or any quantity derived from them. Suppose that 


an “approximate eigenvector” x #0 and an “approximate eigenvalue” į 


have been found somehow. It may not be the case that AX is exactly 


equal to \xX, but when A is diagonalizable we can use the residual vector 


| r=AX—xX to obtain an estimate of how close Å is to an eigenvalue. 


Write A= SAS~', and suppose that À is not exactly equal to any eigen- 


value of A. Then 
Exercise. Let A(t) = A+tE for a fixed perturbation matrix E and show 


r=Ax—h\X=S(A—-KI)S7'% 


- so that X= S(A—XJ)7'S~'r. Then 


[x] =[S(A ÁZ) ST ts WS(A-Kd)7! 
s SISTA- A) Hr 


=«(5)( min KN) Ie 
D 


Solel 


| = «(S)J(A-A)' I [| 


374 Location and perturbation of eigenvalues 6.3 Perturbation th 
. eorems 
375 


Thi ‘ . 
e aan for th for the eigenvalues is not matched by a similarly 
, r the eigenvectors. Even for ar i 
a , : eal symmetric matri 
small residual does not guarantee that the approximate eigenvector i 


close to an eigenvector. For example, consider A = [ ‘| for e >0. If we 
e . 


take N= 1 and 2 =[1,0]7, then the residual is r = [0, €]”. The eigenvectors 
x $ e D ]’ and [1, —1]’ for all e > 0, and £ is not approximately par 
el to either of these two vectors no matter how small e is par 


so that 
|<} min |\i—A] = (SD Ir 
jarsn 
Obviously, the latter inequality holds even when some À; 
argument, we have assumed that 


=Å, For this 


(a) jel is a vector norm on C”; 
(b) The matrix norm lje] on M, is compatible with |e]; and | (6.3.13) 
(c) {|D|| = max |d;| whenever D = diag(d;,..-,dn) € Mn; 


isisn 
and the condition number «(S) is computed using the matrix norm |j+|. 
If A is normal, S may be taken to be unitary, and if we use the /, vector 
norm and the spectral matrix norm, we have x(S) = 1. The condition (c) is 
equivalent to requiring that the matrix norm |e || be induced by a mono 
tone vector norm [Theorem (5.6.37)]. Thus, all of the conditions (6.3.13 
are met if |e] is a monotone vector norm on C” and ||e|| is the matri 
norm on M, that is induced by |+|. We have proved a result about a pos 
teriori bounds which is of the same type as Theorem (6.3.2) and Coro 


lary (6.3.4). 


Exercise. Show that the ei i 
genvalues of A in the precedin 
1+e and 1—e, and verify the bound (6.3.16) in this case. g example are 


Problems 


1. If i 
A, y are eigenvalues of A and \# p, show that any left eigenvector 


of A corresponding to p i 
. p is orthogo ; . 
responding to \. gonal to any right eigenvector of A cor- 


f 


3. Verify the lament ex i 
` pressed in the last 
of this section by considering sentence of the first paragraph 


0 1 
A= _fo 1 
: om 4o=[9 o 


waere ‘ = ` is small. Show that A, is diagonalizable for e > 0 and that the 
ae Wa istance between an eigenvalue of A, and any eigenvalue of A 
s Ve. Write A, = Ao +E, and show that " 
ÁN; 

A-N] =0(e7")>œ© as e>0 


lEI 


6.3.14 Theorem. Let AeM, be diagonalizable with A= SAS~' an 
A =diag(hj,---» An). Let the vector norm |e] on C” and the matrix nor 
\|-|| on Mn satisfy the conditions (6.3.13), let eC” be a given nonze 


vector, let \ be a given complex number, and let r = AX —hx. Then the 
is some eigenvalue N; of A for which 


K-le siisii TOR a 


If A is normal, then there is some eigenvalue hj; of A for which 
Kai) 3 He (6.3.16) 


< 

lle a 

The latter result should be contrasted with the corresponding resuli evaluate pour dof the form |\—,|< ||E|| can be correct in general. Now 
for a posteriori bounds on the relative error in the solution to a system 0f happenin ounds in Theorem (6.3.2) in this case and explain what is 
linear equations. If the matrix of coefficients of a system of linear equa 8- 
tions is ill-conditioned, (5.8.11) says that a small residual does not impi 
a small relative error in the solution. However, (6.3.16) says that if Ai 
normal (in practice, A will usually be Hermitian or real symmetric), an 
if an approximate eigenvector-eigenvalue pair has a small residual, the 
the absolute error in the eigenvalue is guaranteed to be small; no condi 


tion number appears in the bound. 


4, i , 

Xo» atie P Ooo p(x) =(x—X9)*, which has a double root at 
p(x) ~e has t o =P (xo) =0, but pP”(xo) = 0. Show that for small e >0 
order ¢ in th wo roots near xọ of the form xọ+e"?. Thus, a change of 
amou at atthe coeficiente of a polynomial can perturb its zeroes by an 
; er ye. For a polynomial, the ratio of th i 

; . . ’ € 

in a zero to a perturbation in the coefficients can be unbounded. on 


376 Location and perturbation of eigenvalues 


5. Consider the bound (6.3.4), which says that for a Hermitian (or, 
more generally, normal) matrix, the ratio of the perturbation in the 
eigenvalue to the perturbation in the matrix elements is bounded. Since 
the eigenvalues of the matrix are just the roots of the characteristic equa- 
tion, explain how this pleasant situation could be consistent with the con- 
clusion of Problem 4. The moral is that, as a practical matter, it is very 
unwise to compute the eigenvalues of a Hermitian (or any other) matrix 
by forming the characteristic polynomial and then computing its zeroes. 
This has the potential of turning an inherently well-conditioned problem 
into an ill-conditioned one! 


6. Consider Givens’s example of a real symmetric 2-by-2 matrix A =I 
and a real symmetric perturbation 


ecos(2/e) e sin(2/e) 
e sin (2/6) a l 


with £(0) =lim,_.9 E(e) =0. Show that the eigenvalues of A +E (e) are 
1+e and 1—c, and that the respective (uniquely determined up to sign) 
normalized real eigenvectors of A+E(e) are [cos(1/e), sin(1/e)]" and 
[sin(1/e), —cos(1/e)]7 for e>0. Show that as «+0, each eigenvector 
points in any given direction infinitely often. Thus, even if we restrict 
attention to real symmetric matrices, an individual eigenvector may vary 
rapidly if its eigenvalue is not well separated from others. 


Bo =| e>0 


7. Use the argument of Theorem (6.3.5) to show that (under the hypoth- 
eses of the theorem) there is a permutation 7 of the integers 1, 2,...,” 
such that 


(3 Ro MP) BIEL 


Hint: Consider min{ X} ; =; cij Re(\;4;): C = [ci] is doubly stochastic}. 


8. Let Ae M, be a given normal matrix with eigenvalues {\,(A)}, let 
r>0 be given, and define 


S(A,r)={BeM,:B is normal and |B—A|2 <7} 


Show that {\,,..., An} is the set of eigenvalues of a matrix Be S(A,r) if 
and only if 


n 
min} 5 IAA) io o is a permutation of bensr 
i=l 


This gives a complete characterization of the possible sets of eigen- 
values of normal matrices in a neighborhood of a given normal matrix. 
Hint: Use Theorem (6.3.5) for necessity. For sufficiency, suppose that 


6.3 Perturbation theorems 377 


A=UAU* with A=diag(\,(A),...,\,(A)) and define B= UAU* with 
A=diag(Xy,..., Ån). 


9. In the proof of Theorem (6.3.5) we used the fact that if U=[u,,]eM, 
is unitary, then A = [|u; 7] is doubly stochastic. Show that not every 
doubly stochastic matrix arises in this way from a unitary matrix. Hint: 
Consider the example 


1 1 
=|1 0 
01 1 


10. Suppose A € M; is a given Hermitian matrix, and suppose one has 
found somehow a unitary matrix U such that 
3.05 —.06 .02 
—.06 —6.91 07 
.02 07 8.44 


UAU* = 


Give the best estimate you can for the eigenvalues of A. 


11. There is no hope of having a bound of the type (6.3.4) for non- 
normal matrices. Consider A, E €e M,, where 


r ~y 


0 a 
_ 0 0 
A= 
0 
0 
L 0 
o 0 | 
e 0 
E = ‘ ; a,e=0 
o 0 Oa 
e 0 vee 0 | 


_ Show that all the eigenvalues of A are 0, and the eigenvalues of A +E are 
the n distinct values for x/ae”~!. Whatever the value of e >0, all the 
eigenvalues of A+ E can be made arbitrarily large by a suitable choice of 
a. How is the situation different when A is normal? 


` Further Readings. The original version of Theorem (6.3.5) is in A. J. 
Hoffman and H. Wielandt, “The Variation of the Spectrum of a Normal 


378 Location and perturbation of eigenvalues 


Matrix,” Duke Math. J. 20 (1953), 37-39. An elementary proof of this ; 


result in the real symmetric case is in [Wil], pp. 104-109. 


6.4 Other inclusion regions 


We have discussed the GerSgorin discs in some detail. They are a par 


ticular class of easily computed regions in the plane that are guaranteed | 


to include the eigenvalues of a given matrix. Many authors, perhap 


attracted by the geometrical elegance of the Ger’gorin theory, have gen- | 


eralized the ideas and methods of this theory to obtain other types o 


inclusion regions. We discuss a few of these to give the flavor of what has | 


been done. 


The first result gives an eigenvalue inclusion region that is a union 0 


discs, like the Ger8gorin region, but the radii of the discs depend on bot 
the deleted row and column sums. The separate row and column su 
version of GerSgorin’s theorem are obtained as limits in this result, due t 
Ostrowski, which may therefore be viewed as giving a continuum 0 
inclusion regions that interpolate between (6.1.2) and (6.1.4). 


6.4.1 Theorem (Ostrowski). Let A = [a;;] E€ Mn, let œ e [0,1] be given, | 
and let R/ and C/ denote the deleted row and column sums of A, respec- 


tively: 
Ri = » |ai;| (6.4.2); 
Jri 
n 
Ci= X lay (6.4.3)! 
j= 


j=i 
Then all the eigenvalues of A are located in the union of n discs 


n 


i=l 


U {zeC: |z-a;| £ R/C)" (6.4.4) 


6.4 Other inclusion regions 379 


[A= aii] [xi] = 2, Qij Xj <2 layl xy] = © lay“ Clay]! [2x13 
j#i ji Szi 
` jala TZ — , l~a 
=| flai; e] > flail’ “l= (6.4.5a) 
j#i iwi 


n l~a 
SREL $ layli] 
j= 
Ji 


which, since Rý > 0, is equivalent to 


N= ii Ml =a 
Detal pea | $ atat] 


and hence 


[A — aii] a=) 1/0 ~a) z 
Ae] Ys S Jayla (6.4.5b) 


R 
I j= 


iwi 


The inequality employed in (6.4.5a) is Hölder’s i i i 
} q 4, s inequality (Appendix B) 
with p= 1/a and q=p/(p—1) =1/(1—a). Now sum (6.4.5b) on i to get 


4 |A— aii =a) 1a — n n 
> a | [xl 9< E x Jaj| |x|" 
i= j= 
a (6.4.6) 


n 
= 5 Cj 
jZ 


A~— Qi YU-a) 
ka Sc 
i 


for every i such that x; 0, then (6.4.6) co 
> 4, uld not 
conclude that ) ot be correct. Thus, we 


A—~— ai 1/U—«) 
È z <C! 


Proof: We may assume that O0<a<l, as the cases a=0 and a=] 
(Gerggorin’s theorems for column and row sums, respectively) can be 
obtained by taking limits. Furthermore, we may assume that all R; >0, 
because we may perturb A by inserting a small nonzero entry into any 
row in which R/=0; the resulting matrix has an inclusion region (6.4.4) 
that is larger than the region for A, and the result follows in the limit as 
the perturbation goes to zero. 

Now suppose that Ax= dx with x= [x;]#0. Then for each i= 
1,2,...,7” we have 


for at least one value of i such that x; #0, and hence 
IA — a;l = Ri Cpe O 


Exercise. i = [i s] šgori 
ercise. Consider A =|, «| and compare the Geršgorin row and column 


eigenvalue inclusion regions with Ostrowsi’s region for a = i. What esti- 
mate does Ostrowski’s theorem give for the spectral radius of A and how 
does it compare with the GerSgorin estimates (6.1.5)? 


380 Location and perturbation of eigenvalues 


Exercise. What is the Ostrowski version of Corollary (6.1.6)? 


The next result, due to Brauer, is also a generalization of GerSgorin’s 
theorem, but now the rows are taken two at a time; the geometrical 
regions are no longer discs but sets known as ovals of Cassini. The proof 
obviously parallels the proof of GerSgorin’s theorem in that one selects 
not just the largest modulus component of the eigenvector, but the two 
largest modulus components. 


6.4.7 Theorem (Brauer). Let A = [a;;] € M,. All the eigenvalues of A 
are located in the union of n(n—1)/2 ovals of Cassini 


(6.4.8) 


id 
Lx] 


i 

Y {Ze C: IZ — ajl lz —a;;] <= R/R}} 
Proof: Let A be an eigenvalue of A, and suppose that Ax = Xx with x = 
[x;] #0. There is an element of x that has largest absolute value, say xp, 
so |xp| = |x;| for all /=1,..., and x, #0. If all the other entries of x are 
zero, then the assumption that Ax = \x means that app =). Since all the 
diagonal entries of A are included in the region (6.4.8), the eigenvalue A is 
in this region whenever its associated eigenvector has only one nonzero 
entry. 

Now suppose that there are at least two nonzero entries of the eigen- 
vector x, and let x, be the component with second largest absolute value; 
that is, |x,| = |x | = |x;| for all ¿=1,..., 7n, ixp, and x,#0#xX,. Then 
Ax =x means that 


n 
Xp(A— App) = » apj Xj 
j=l 


J#p 

which implies that 
spl A~ appl =] ans] s Slaps bps X, lanl al = Roll 

i= i= j= 

J#p J#p j#p 
or 

appl = Ri (6.4.9) 
IX 


But we also have 


n 
Xqg X= qq) = È aqjX, 
j=l 
j#q 
which implies that 


6.4 Other inclusion regions 381 
n n n 
[xal | — aqq] = > gj Xj = 2 laqjl |X; 5 È lag; [xp] = RG |xp| 
jaq jag jeg 
or 


[A — aqq] < Rj Hel 
Xa 


Taking the product of (6.4.9) and (6.4.10) permits us to eliminate the 
unknown ratios of components of x and obtain 


(6.4.10) 


[A= app] [A — aqq] = Ry Ry Pl = R,Rg 


Thus, the eigenvalue lies in the region (6.4.8), O 
Exercise. What is the column sum version of Brauer’s theorem? 


l Any theorem about eigenvalue inclusion regions implies (and, indeed, 
is implied by) a related theorem about invertibility. One just uses the 
inclusion result to set conditions that prohibit z =0 from being in the 
region. 


6.4.11 Corollary. If A= [a;;] € Mn, then either of the following condi- 
tions is sufficient that A be invertible: 


(a) Forsome ae[0,1], jap] > R/C" forall i=1,....n 
(Ostrowskii) 
(b) a;l |aj;;| > RIR; for all i,j=l,...,n, 


ie] (Brauer) 


Exercise. Use (6.4.1) and (6.4.7) to prove (6.4.11). 


Brauer’s theorem involves products of rows taken two at a time. An 
attractive possibility for further generalization is suggested by the idea of 
taking the rows three or more at a time, and considering, for each m= 
l... n, the union of sets of the form 


m m 
free: Mi-as IIR], A=taylem, 
=] 


fu (6.4.12) 


For each m, there are ( me) sets of this form; m = 1 gives the n Gerggorin 


discs and m = 2 gives Brauer’s n(n —1)/2 ovals of Cassini. Unfortunately, 
for m= 3 the sets (6.4.12) need not be be eigenvalue inclusion regions at 
all, as shown by the example 


382 Location and perturbation of eigenvalues 
1 10 0] 
au|i 100 (6.4.13) 
0 0 1 0 
000 1, 


The sets (6.4.12) for m=3 and m=4 all collapse to the point z=1. 


Exercise. Show that the eigenvalues of the matrix in (6.4.13) are \=0, 1, 
1, and 2. Sketch the sets (6.4.12) for m=1, m=2, and m=3,4. Show 
that the same phenomenon occurs for all m 23 by considering 


J 0 
= M, (6.4.14) 
A | L E +2 


where J= li i] e Mz and „€ M, is the n-by-n identity matrix. 


Although this example eliminates the most obvious generalization of 
Brauer’s theorem, it suggests what may be wrong and how to deal with it. 
The problem with the region (6.4.12) is that it admits too many products, 
some of which may be zero because of zero deleted row sums. Of course, 
this can’t happen if the matrix A is irreducible; all Rj >0 in this case. 

However, even if A is irreducible, the region (6.4.12) may not be an 
eigenvalue inclusion region for A; it may still admit too many products. 
Consider the perturbation of (6.4.13) given by 
1 
l 


A, = I>e20 (6.4.15) 


m ON 


€ 


> Ce ee 
ot OS D 


€ 


The directed graph T (4,.) of A, is 


P 
‘ Ww Py 
| a 
VON ` 
YON x% 
l 4 `N NS 
\ | NG NS 


6.4 Other inclusion regions 383 


where the dashed arcs disappear when e =0. If e0, T(4.) is strongly 
connected and A, is irreducible. We find that 


Ri =1+2e, Ri =], Ri =e, Ri =e 
and A, has eigenvalues 


Ae =1,1,14+(14+2€7)* and 1-(1+267)'? 
Exercise. Verify the above calculations for A,. 


Since any product of three or more of the R/ terms contains at least 
one factor of e, the sets (6.4.12) cannot be eigenvalue inclusion regions 
for either m =3 or m=4 when e is small and positive. 


Exercise. Show that this conclusion holds for all m= 3 by considering a 
perturbation of (6.4.14) of the same form as (6.4.15). 


What intrinsic property of (6.4.13) and (6.4.15) indicates that m=1 
and m=2 are acceptable in (6.4.12), but that m=3 and m=4 are not? 
Richard Brualdi noticed that the directed graphs in each case do not con- 
tain any cycles of length 3 or 4, but do contain cycles of length 1 and 2. 
This turns out to be the key observation in obtaining a correct generaliza- 
tion of Brauer’s theorem. 

Recall that a directed graph T is strongly connected if and only if from 
each node there is a directed path in T to any other node (and back). 

We say that T is weakly connected if and only if from each node there 
is a directed path to some other node and then back. This is equivalent to 
the assertion that each node in T belongs to some nontrivial cycle; a 
trivial cycle (or loop) is a directed path of length 1 that begins and ends at 
the same node. 

In terms of matrices, we know that T (4A) is strongly connected if and 
only if A is irreducible. We say that A is weakly irreducible if and only if 
r(A) is weakly connected. Weak irreducibility does not seem to have as 
striking a characterization as irreducibility [in terms of something like 
the permutation similarity (6.2.21b)], but in terms of the zero-nonzero 
structure of the entries of A it is clear that A is weakly irreducible if and 
only if for each i=1,...,n the ith row of A has at least one nonzero 
off-diagonal entry a;;, such that there is a sequence Gk; 45, Akakas +++ 5 Uk —tkm 
of nonzero entries of A for which k =j; and kp =i. This cumbersome 
condition is about half of the requirement (6.2.7) that A have property 
SC, and it is perhaps more conveniently stated for computational pur- 
poses in a form analogous to Theorem (6.2.23). 


384 Location and perturbation of eigenvalues 


6.4.16 Lemma. If Ae ™M,, then A is weakly irreducible if and only if 
either of the matrices 

(a) B=(I+|All"! or 

(b) B=[I+M(A)]""' 
has the property that for each i=1,...,7 there is at least one nonzero 
off-diagonal entry bij in the ith row (J Æi) such that bj; is nonzero as 
well. 


Exercise. Prove Lemma (6.4.16). Hint: Use the ideas in (6.2.19). 


Exercise. Suppose A € M, and let Be M,, be defined by either (a) or (b) in 
(6.4.16). Show that A is weakly irreducible if and only if T (B) has the 
property that every node belongs to a cycle of length 2. What is the corre- 
sponding property for irreducible? Which property is weaker? Recall 
that a cycle is simple by definition; only the initial (which is the same as 
the final) node can appear in the list of nodes more than once. 


Exercise. 1f A€ Mn is weakly irreducible, show that all Rí >0 and all 
C}>0. 

A preorder on a set S is a relation R defined between all pairs of 
points of S such that for any pair of elements s$, f € S, either sRt or (Rs or 
both. A preorder must also be reflexive (sRs for every s € S) and transi- 
tive (if sRt and tRu, then sRu). A preorder might not be symmetric (sRt 


whenever (Rs), and it could be that sRt and tRs without s=t. A point z 
in a subset So of S is said to be a maximal element of So if sRz for all 


SESo. 


Exercise. Let S be any nonempty set of complex numbers. Show that the 
relation between pairs of complex numbers Z, Ww € S defined by 


zRw if andonlyif |z|<|w| 


is a preorder on C. 


6.4.17 Lemma. Let S be a nonempty finite set on which there is define 
a preorder. Then S contains at least one maximal element. 


Proof: Arrange the elements in any order Si,- 


the rest of the elements. The final value of s is a maximal element. © 


s,. Set s=5. If SRS, 
then leave s alone, but if not, then set s = 52. Continue this process with | 


6.4 Other inclusion regions 385 


be if T i a directed graph and if P; is a node of T, we define Tou (P;) to 
he he sẹ o i pode different from P; that can be reached from P, by some 
path of length 1. Notice that if T is w ted, 
I veakl 
Iour(P;) is nonempty for every node P; eT. y connected, then 
ruby. N denote Dy rA the set of nontrivial cycles y in the directed graph 
. cycle is one that contains at least isti 
that is, it is a (simple directed) i Toon For the nodes 
, cycle that is not a loop. For the i 
. * . m 
64D C(A) consists of the single cycle y = P,P», PaP}, while for the 
rix (6.4.15) there are three separate nontrivial cycles, all of length 2 


6.4.18 Theorem (Brualdi). If A . 
. . = [a;] E M, is i : 
every eigenvalue of A is contained in i ve ‘elon weakly irreducible, then 


U (zec: II |z-a:,|< [I Ri) (6.4.19) 


ye C(A) Pey Pey 
The notation means that if 
! y = Pi Phs -3 Pa Puad ivi 
with P;,,,=P;,, then each of the | d in (6.4.19) c Mains eva tv k 
W | i products in (6.4.19) contains exactly k 
rms, and the index i takes on the k values i), i, ..., i; > 
Proof Suppose à is an eigenvalue of A and suppose \=a,, for some 
iagonal entry of A. Then A is obviously i ion ( 
Tact, oll Rie e ously in the region (6.4.19). In 
fact, s weakly irreducible, so in this c ies | 
l , ase À lies in th 
menor region (6.4.19). If each of the eigenvalues of A is equal to 
iagonal entry of A, then all the ei ie i 
, genvalues o 
interior of the region (6.4.19) and we are done. A Nie in the 
an rest of the argument, we suppose that ) is an eigenvalue of A 
) en =Æ dy for all /=1,...,n. Let Ax= dx for some nonzero x= 
x;]¢C". Define a preorder R on the nodes of I by p 


P;RP; if and only if |x;|<|x,| (6.4.20) 


We shall show that there exis i 
tsa ' ; . 
properties: cycle y’ in (A) with the following three 


(a) y= PPh, PhP; P j 
23 Ph Piss e3 Pig Piga, 18 a nontrivial (si 
directed) cycle with k= 2 and Ph, = P. (simple 


(b) rnp bees oo node P,,,, is a maximal node 
out(P;,); that is, |x;,,,| = [Xp] for all 
P, € Pou (P, ). ij+l | | ml a m such that 


(6.4.21) 


(c) Alx, #0, f=1,...,k. 


386 Location and perturbation of eigenvalues 


If y’ is a cycle that satisfies the conditions (6.4.21), then Ax = dx im- 
plies that for any j=1,...,k we have 


n 
A— a; i )Xi = 5 ai mXm = ai mXm 
( Ppl pd mal J PrE Pou (Pi) 
msi; 
and hence 
[A— a;i] |x; = ai mXm = [aim] [Xm (6.4.22) 
7 Pm€ Vout (P) Pm€ Vout (Pi) 
s [ai m| Xi 41 (6.4.22a) 
Pm€ Vou (Pi ;) 
= Rj |xi,,,| 


If we now take the product of the inequalities (6.4.22) over all the nodes 
in y’, we obtain 


k k 
; 6.4.23) : 
TD Day els TRG ( do 
But ; 
k 
—a,,|= — aj; d Ri = R! 
H [A ai i| H, |A—a;;| an pet j pl ! 
and since P;,,,=Pi,, we also have x;,,,=i,. Therefore, 
k k 
IT |xi,|= IT x10 (6.4.24) | 
jai l j=l 
Thus, dividing (6.4.23) by (6.4.24), we obtain 
II |A—ails II Ri (6.4.25) 
Pey Pey 


Since y’ is a nontrivial cycle in T (A), the eigenvalue \ must therefore lie | _. 


in the region (6.4.19). 


We must now show that there is a cycle y’ that satisfies the conditions § 


(6.4.21). Let i be any index for which x; #0. Then from the identity 


n 
(A-a,)x;= È ajxj= X ajx 
j=l Pje Vou (P) 
jæi 


and the fact that x;=0 and \—a;;#0, we see that the left-hand side is 
nonzero and hence among the nodes of Igy: (P; ) [those P; such that a;; #0 
and j = i; oy:(P;) is nonempty because T (4) is weakly connected] there f 


6.4 Other inclusion regions 387 


must be at least one for which the corresponding eigenvector component 
x; is nonzero. Let Pi = P;, and let Pi, be a maximal node among the 
nodes in Pow (Pi), that is, Xin] = |x| for all m such that Pm EV ou (Ph). 
We are guaranteed that x;, #0. 

Suppose that the preceding construction has produced a directed path 
Pi, Pins PizPigy 065 Pi Pi j of length j—1 that satisfies conditions (b) and 
(c) of (6.4.21); we have just done this for j=2. Then 

(A ~ Gi ,i,)Xi, = Pn eT ou (P, Gi mXm 
and the left-hand side is nonzero, so there must be at least one node in 
Pou (Pi) [nonempty because T (A) is weakly connected] for which the 
corresponding eigenvector component is nonzero. Thus, if we choose 
Pi,,,to be a maximal node in Tout (Pi,;), we are guaranteed that Xj, ,, #0. 

Because there are only finitely many nodes in P(A), this construc- 
tion for J =2,3,... must eventually produce a first maximal node Pi,€ 
Pout (Pi,_,) which was produced as a node P;, at some previous step 
(2s p+1<q). Then Y'= PipPin gs Ping Ping as bees Pi,_\Pi, is a cycle in 
r(A) which satisfies all three conditions in (6.4.21). O 


Brualdi’s theorem has a sharper form when A is actually irreducible; it 
is the generalized Brauer (6.4.7) version of Theorem (6.2.26). 


6.4.26 Theorem (Brualdi). Let A =[a; j]€M,, be irreducible. A bound- 
ary point à of the region (6.4.19) can be an eigenvalue of A only if the 
boundary of each set 


[zec II |z-a;|< JI Ri) 


Piey Prey 


(6.4.27) 
passes through à for every nontrivial cycle ye C(A). 


Proof: Since all R/>0, if X =a;; for any i=1,2,...,n, then \ could not 
be on the boundary of the region (6.4.27). Thus, we may assume that 
\#a; for all i=1,...,n and we may continue the argument used in 
Brualdi’s theorem (6.4.18) with the same notation, but with the addi- 
tional assumption that \ is an eigenvalue of A that lies on the boundary 
of the region (6.4.19). Just as in the proof of Lemma (6.2.3), \ must 
satisfy the inequality 

Il A= a;l = Il Ri 

Prey Prey 
for all nontrivial cycles ye C(A) [with equality for at least one ye 
C(A)]. Comparing this inequality with (6.4.25), we see that 


388 Location and perturbation of eigenvalues 6.4 Other inclusion regions 


IL -ai= I Ri (6.4.28) 
Pey Pey 

for the particular cycle y’ constructed in the proof of (6.4.18). Thus, 
the inequality in (6.4.23) must be an equality, as must both of the in- 
equalities in (6.4.22) for all j=1,2,...,k. In particular, the inequality 
(6.4.22a) must be an equality, and hence for each Pi Ey’ and for all m | 
such that Py, € Tou (Pi) (Xml = [Xi] = Ci = Constant. Notice that this 
conclusion follows for any cycle that satisfies the conditions (6.4.21). 

Now define the set 


389 


Since the cycle y” sati iti 
y” satisfies the conditions (6.4.21), i 
Y .4.21), it can be i 
Place of Oe cycle y W the proof of Theorem (6.4.18) By the aan in 
aragraph of the present proof, we , 

‘ ; conclude that |x | =c; = 
constant 4 for all P, E Pou (Pj,) for all Pj, ey”. Therefore, en Se= 
re nA ' a poniradicrion to the conclusion that y” contains at least one 

in K. This sho 
node a ws that there can be no node of T (A) that 
If y is any nontrivial (sim i 
f ple directed) cycle in F(A), it will 
mae musty the conditions (6.4.21) because all its ates are in pa 
may there ore be used in place of y’ in the proof of Theorem (6.4 18) 
and perce i a be used in place of y’ in (6.4.28). This is the desired 
n: The boundary of every set (6.4.27) passes through \. © 


K={Pi e T (4): |Xm| =¢; = constant for all m such that P,€Tou(Pi)} 


We know that K is not empty because all the nodes of y’ are in K. We 
would like to show that all the nodes of T(A) are in K. 

Suppose there is a node P4 of (A) that is not in K. Because T(A)i 
strongly connected, there is at least one directed path in (A) from each 
node of K to this external node P,. If we select from all such directe 
paths a path with shortest length, then its first arc must be from a node i 
K to a node P; that is not in K. If we use the same preorder on the node 
of I'(A) that we used in the proof of Theorem (6.4.18), then we ma 
employ the same construction used in the proof of Theorem (6.4.18) 
start with the node Py= Pj, select a maximal node Pj, € Tou (Pi), select J 
a maximal node Pj; € Tou Ps), and so on. At each step, Pou (Pa) is non 
empty because T (A) is weakly (even strongly) connected, and the max 
imal node satisfies condition (c) of (6.4.21) for the same reason as before 

If, at some step of this construction, we have a choice between selec 
ing a maximal node which is in K or not in K, we shall always choose on 
that is not in K. If at any step, all the maximal nodes from which we ma 
choose are in K, choose any one of them and then follow a directed pat 
of shortest length (necessarily in K) to a first node that is not in K an 
resume selecting maximal nodes as before. By definition of K, an 
directed path in K will have the property that each node is a maximal! 
node in Tou of its predecessor node [condition (b) of (6.4.21)]. Becau 
the complement of K has only finitely many nodes, this constructio 
must ultimately produce a first maximal node in the complement of 
that was produced as a node at some previous step. The directed pa 
between the first and second occurrences of this node in the constructio 
will be a nontrivial directed cycle, which may not be simple because 
the way we have forced the path to leave K whenever the constructio 
leads to a node in K. There may be finitely many cycles in the part of t 
path that lies within K, but they can be pruned off to leave a simp 
directed cycle y”, which satisfies the conditions (6.4.21) and contains 
least one node which is not in K. 


6.4.29 Corollary. If Ae M, i 
i 5 th ; ee ` 
sufficient for A to be ie en either of the following conditions is 


(a) Ais weakly irreducible and 


[I |ai:|> II R; 
Prey Piey 
for every nontrivial c 
re ycle ye C(A); 
(b) Ais irreducible and ECA); or 
II |ai;|= II R; 


Pey Pje y 


for every nontrivial c . 
ycle ye C(A s: . 
least one cycle. y€C(A) with strict inequality for at 


. Show i i ; 
` ow i n, me en lays F lan] satisfies Brauer’s condition (6.4.11b) 
, duii > R/ for all but at most ofi 
nT , AM I l st one value of i= 
hus, Brauer’s condition is only slightly weaker than the Levy- 


_Desplanques condition (6.1.10 i 
thi .1.10a . 
“this related to (6.1.11)? ) of strict diagonal dominance. How is 


2. Sh =[? 3] is inverti 
2. Show that A= f 5] is invertible by both conditions (6.4.11) but 


either the Levy-Desplan ae 
7 a ques condition (6.1.10a 
Invertibility. What about the column form of 1 ne suarantee 


3. Show that every irreducible matrix A eM, with n=2 is weakly irre 


390 Location and perturbation of eigenvalues 


4. Provide the details for the proof of Corollary (6.4.29). Hint: Use th 
same arguments as in (6.1.10) and (6.2.6). 


5. Show that A €e M, is weakly irreducible if and only if A is not permu 
tation similar to a block triangular (0.9.4) matrix one of whose diagona 
blocks is 1-by-1. 
CHAPTER 7 
Further Reading. For more details about inclusion regions and man 
references to the original literature, see R. Brualdi, “Matrices, Eigen 


values, and Directed Graphs,” Lin, Multilin. Alg. 11 (1982), 143-165. Positive definite matrices 


0 Introduction 


A class of Hermitian matrices with a special positivity property arises 
naturally in many applications. Hermitian (and, in particular, real sym- 
metric) matrices with this positivity property also provide one generaliza- 
tion to matrices of the notion of a positive number. This observation 
often provides insight into the properties and applications of positive 
definite matrices. The following are examples of ways in which these 
special Hermitian matrices arise. 


Hessians, minimization, and convexity 


Let f(x) be a smooth real-valued function on some domain DCR". If 
y=([y;] is an interior point of D, then Taylor’s theorem says that 


- S æ- y) L 
FOEI E Ci WZ 


y 
2 


1 
ne Vi) (x; Dy, ax, 


¥ 
for points x €e D which are near y. If y is a critical point of f, then all the 
first-order partial derivatives vanish at y and we have the expression 

2 


n ð 
f(x)-f(Y)= X (xv) (x, -y) 


i,j= OX; Ox; 


=(x—y)H( fs y)(x—y) +" 
391 


392 Positive definite matrices 7.0 Introduction 


for the behavior of f near y. The n-by-n matrix 393 


2 
ngi» =| i | 
¥ 


is called the Hessian of fat y;itisa symmetric matrix because of equality 
of the mixed partial derivatives of f. If the quadratic form 


aj=[ FOG) a) dx, i, f=l,..0 


3 


mitian. One computes easily that 


n 


zAaz= D | zi fede) hodglx) ax 


i,j=l 
2 
om OD 


so this quadratic form wi 
; will alwa EE ; 
function. ys be nonnegative if g(x) is a nonnegative 


zZTH(f;y)\z»  z7%*0, zeR” (7.0.1) 


is always positive, then y is a relative minimum for f. If this quadratic 
form is always negative, then y is a relative maximum for f. Of course, $ 
this quadratic form might not have a definite sign for all nonzero z € R'E 
in which case the nature of the critical point y is not determined. In the E 
case n=1, these criteria are just the usual second derivative test for a 
relative minimum or a maximum. The third possibility occurs for n=! 
only at a point of inflection; when n> 1, the situation can be much more 
complicated. 

If the quadratic form (7.0.1) is nonnegative at all points of D (not just 
at the critical points of f), then f is a convex function in D. This is a 
direct generalization of the familiar situation when n=1. i 


2 
g(x) dx 


x Zi Si(x) 


: Algebraic moments of nonnegative functions 


| Let f(x) be an absolutely i 
E. y integrable z ‘ 
X interval [0,1] and consider the sarees al-valued function on the unit 


1 
a,= f x* f(x) dx 
(7.0.2) 


Variance-covariance matrices l 
_ The sequence ao, a), a2, ... is said to be a Hausdorff moment 
ent sequence, 


Let X1, X2, +++» Xn be real or complex random variables with finite second and it is naturally associated with the real quadratic fo 
moments on some probability space with expectation functional £, and ae 
suppose that pj = E(X;) are the respective means. The covariance matrix 


of the random vector DG (Xps KI is the matrix A = [a;j] in which 
aj; = EUXi— Bi) (Xj), pal! 


It is apparent that A is Hermitian, and one computes easily that if z= 
[z;] € C”, then 


n 

ES apk Z= a WE 2 
Be PAR ze fle) de =| ( 2 zx") fix) dx 
ae: k=0 


ee en. | (7.0.3) 
E” ee ae * Dae An be a symmetric real matrix and we shall 
p ee z peer if f(x) = 0 for all xe [0, 1]. This is true for 
7 K NENE atrix with the structure of A (i.e., the elements a 
ij 


-area function only of i+/) i 

a J) is called a Hankel matri 

: is Tix, , 
poar form is nonnegative. See Section (0.9.8) whether or not its 


I 


n 
Z*AZ =E| > aS nea] 
„j= 


2 


s0 l Trigonometric moments of nonnegative functions 


SY zi(Xi— ui) 


i=l 


=E 


F Let J(0) be an absolutely i 
The only properties of the expectation functional that are involved in thi consider the numbers Aa n 


observation are its linearity, homogeneity, and nonnegativity; that if 
E[Y]=0 whenever Y is a nonnegative random variable. l 
The same observation can be made without recourse to probabilist 


language. If one has a family of complex valued functions fi, f2 e% 


The sequenc : 
on the real line, if g i l-valued f j d if all the i is Fse : € ao, 4j,4~1,42,Q_2,... 15 said to be a Toepli 
_ if g is a real-value unction, and if all the integrals p sequence, and it is naturally associated with a Toeplitz moment 
s ated with the quadratic f 
orm 


2r 
a,=\ ef) do,  k=+ 
, = +], +2,.. 
i +2, . (7.0.4) 


394 Positive definite matrices 


n n Ie ik), 5 
5 gauz Dh, eiU -007 Z; f(0) d0 
j,k=0 jk=0 


2x n . 
iké 
=| DY ze 
0 |k=0 


If we set A=[a;_;], then A will bea Hermitian matrix and we shal nave 
z*Az =0 for allze C”*' if f(8) 20 for all 0 € [0, 2r]. This is true ore h 
n=1,2,.... A matrix which has the structure of A (i.e., the elemen . “ j 
are a function only of i—/) is called a Ti oeplitz matrix, whether or no i S 
quadratic form is nonnegative. See Section (0.9.7). Itisa fact £ ochner $ 
theorem) that nonnegativity of the quadratic form (7.0.5) is ron neccs- 
sary and sufficient for the numbers ak to be generated by a slig moe 

fication of the formula (7.0.4) (in which a nonnegative measu ph 


replaces f(@) d@). 


(7.0.5) 
“f(0) dé 


Discretization and difference schemes for numerical 
solution of differential equations 


Suppose we have a two-point boundary value problem of the form 


—y"(x) + 0(x) v(x) = f(x), 0<x<l 
y(0) =a 
y(1)=£ 


where a and 8 are given real constants, and f(x) and o(x) are given rel 

valued functions. If we discretize this problem and look only for the 

values of y(kh)= yg, k=0,1,... 7+1, and if we use a divided difference 

approximation to the derivative term 

y(K+1)A)—2y (Kh) + YCK-DA) _ Yksi a? Yk +yk-ı 
h? h 


y”(x)= 


we obtain a system of linear equations 


ee + op Yk = fks k=1,2, if 
yor a 
Yna =Ê 


Here we have taken h=1/(n+1) for n a positive integer, Yk = y(kh), 


Ok ™= 


ated into the first (k =1) and last (k =n) equations to give the system 


a(kh), and fp =f(kh). The boundary conditions can be incorpor- 


7.0 Introduction 395 

(2+h* oi) yi —y2 =A fita 
—Ve-1t(2+h on) Ve Per =h? fy, 

—In-1t (24h? On) Yn =h fy $B 


which can be written more compactly as Ay =w, where y=[ypJeR’, 
w=[h*fita,h fry? fai, A? fa +B)" ER", and A€ M, is the tridiag- 
onal matrix 


k=2,3,...,n-1 


[2+h?o, -—] 7 
~1 2+h*o, -] 0 
A= He e, z (7.0.6) 
-1 2+h°0,] -1 
L 0 —] 2+h*o, | 


Notice that A is a real symmetric tridiagonal matrix regardless of the 
values of o(x), but if we want to be able to solve Ay =w for any given 
right-hand side, then we must impose some condition on o(x) to ensure 
that A is nonsingular. 

It is easy to compute the real quadratic form associated with A: 


n-i n 
xTAx = [xP 5 Oia tai tn 5 ox? 
i=] i=l 


The first group of three terms is nonnegative and can vanish only if the 

components of x are all equal, and equal to zero. If o(x)=0, then the 

last sum is nonnegative and 

no] 

xTAx > [xt 5 Gr) +] >0 (7.0.7) 
i=] 

If A is singular, then there is some nonzero vector X e R” such that Ax = 

0, and hence 7A% =0. But then the central group of terms in (7.0.7) 

must vanish, which implies that 2 =0. Thus, if o(x) = 0, the matrix A is 

nonsingular and the discretized boundary value problem can be solved 

for arbitrary boundary conditions a and 8. 

This is a typical situation in the study of numerical solutions of ordi- 
nary or partial differential equations. For computational stability it is 
desirable to design a discretization of a differential equation problem that 
leads to a system of linear equations Ay = w in which A is positive defi- 
nite, and it is usually possible to do so when the differential equations are 
elliptic. 

Matrices with the special positivity property illustrated in these ex- 
amples are the object of study in this chapter. These matrices arise in 


396 Positive definite matrices 


many applications: in harmonic analysis, in complex analysis, in the 
theory of vibrations of mechanical systems, and in other areas of matrix 
theory such as the singular value decomposition and the solution of 
linear least-squares problems. 


Problems 


1. If the sequence a, is generated by the formula (7.0.2) with a nonnega- 
tive function f, show that the quadratic forms 
n n 


$ anjaziz and $ {diy asjlziZ 
i,j=l j=l 


z=[z]eR” 


are both nonnegative. 


2. Make a sketch illustrating which diagonals are constant in a Hankel 
matrix. Do the same for a Toeplitz matrix. 


3. Show that the matrix A in (7.0.6) is always irreducible, and that it is 
irreducibly diagonally dominant if o(x) 20. Use Corollary (6.2.27) to 
show that A is nonsingular and that all the eigenvalues of A are positive. 


Further Readings. For a short survey of facts about real positive definite 
matrices see C. R. Johnson, “Positive Definite Matrices,” Amer, Math. 
Monthly 77(1970), 259-264. Other surveys that focus on different areas 
involving positive definite matrices and contain numerous references are 
O. Taussky, “Positive Definite Matrices,” pp. 309-319 of Inequalities, ed. 
O. Shisha, Academic Press, New York, 1967; and O. Taussky, “Positive 
Definite Matrices and Their Role in the Study of the Characteristic Roots 
of General Matrices,” Advan. Math. 2(1968), 175-186. 


7.1 Definitions and properties 
An n-by-n Hermitian matrix A is said to be positive definite if 


x*Ax>0 for all nonzero xEC" (7.1.1) 


If the strict inequality required in (7.1.1) is weakened to x*Ax = 0, then A 
is said to be positive semidefinite. Implicit in these defining inequalities is 
the observation that if A is Hermitian, the left-hand side of (7.1.1) is 
always a real number. Of course, if A is positive definite, then it is also 
positive semidefinite. 


Exercise. What do positive definite and positive semidefinite mean when 
n=1? 


7.1 Definitions and properties 397 


Exercise. Show that if A e M, and if x*Ax is real for all xe C”, then A is 
Hermitian. Thus, the assumption that A is Hermitian is not necessary in 
the definition of positive definiteness. It is customary, however. Hint: 
Write A= B+iC with B and C Hermitian. 


Exercise. Show that if A e M, is a real matrix and if x Ax is positive for all 
nonzero xe R”, then A need not be symmetric, and hence it need not be 
positive definite. Hint: Consider a real skew-symmetric matrix A and com- 
pute (x’Ax)7. What is x” Ax in this case? What about x*Ax for nonreal x? 


Exercise. Show that [i i ] is positive semidefinite but not positive definite. 


Exercise. Show that if A = [a;;] € M, is positive definite, then so are A = 
[a], A’, A*, and A`! Hint: If Ay =x, x*A7!x = y*A*y, 


Similarly, the terms negative definite and negative semidefinite may be 
defined for A by reversing the inequalities in the definitions of positive 
definite and positive semidefinite or, equivalently, by saying that —A is 
positive definite or positive semidefinite, respectively. Thus, any state- 
ment about negative definite matrices mirrors a statement about positive 
definite matrices. If a Hermitian matrix falls into none of the aforemen- 
tioned classes [i.e., if the left-hand side of (7.1.1) takes on both positive 
and negative values], it is said to be indefinite. 

Several immediate observations may be made about positive definite 
matrices, and each has an analog for positive semidefinite matrices. 


7.1.2 Observation. Any principal submatrix of a positive definite ma- 
trix is positive definite. 


Proof: Let S be a proper subset of {1,2,...,} and denote by A(S) the 
matrix resulting from deleting the rows and columns complementary to 
those indicated by S from the positive definite matrix A e M,,. Then A(S) 
is a principal submatrix of A, and all principal submatrices arise in this 
way; recall that the number det A(S) is a principal minor of A. Let xe C” 
be a nonzero vector with arbitrary entries in the components indicated 
by S and zero entries elsewhere. Let x(S) denote the vector obtained 
from x by deleting the (zero) components complementary to S, and 
observe that 


X(S)*A(S)x(S) =x*Ax>0 


Since x(S) #0 is arbitrary, this means that A(S) is positive definite. O 


398 Positive definite matrices 

i iti j ix are 
Exercise. Show that the diagonal entries of a positive definite matr 
positive real numbers. 


be : PT 
7.1.3 Observation. The sum of any two positive definite ee 
ie: iti i ny nonnega 
ize i definite. More generally, a i 
the same size is positive l l eral y 1 ee 
combination of positive semidefinite matrices 1s positive sem 


3 ; : 

Proof: Let A and B be positive semidefinite, let a,b h and sae ie 

*(aA+ bB)x = a(x*Ax) + b(x*Bx) 20 for any xe C”. The case © ee 
ee two summands is treated in the same way. If the coefficien 


V iv i i V is nonzero 
positi e if A and B are positi e definite, and if the vector x 1s ; 
> 


i iti itive li ination of 
then every term in the sum Is positive, so a positive linear combina 


positive definite matrices is positive definite. O Ie 
; PO age : 
Thus, the set of positive definite matrices 1s a positive cone 1n 
A * 
vector space of all matrices. 


an eee 
71.4 Observation. Each eigenvalue of a positive definite matrix 


positive real number. 


Proof: Let A be positive definite, let A i a(A Pela ati S 
i i Iculate x*Ax =x"Ax = . , 

A associated with \, and ca cul i ae 7 

(x*Ax)/x*x is positive since it is a ratio of two positive numbe 


7.1.5 i te 
of a positive definite matrix are positive. 


eigenvalues. The rest follows from (7.1.2). 


i incipal 
Exercise. Show that the eigenvalues, trace, determinant, and princip 


: : : ae 
minors of a positive semidefinite matrix are all nonnegati 


- ive def- $ 
Exercise. Show that the eigenvalues and trace of an n-by ears a 
nite matrix are negative, but the determinant Is negative tor O 


positive for even 7. 

Exercise. Show that if A= [a;;] 

|a|? 
ajjajj> la|? P 

.,n. Show that “>” must be replaced by “2” in this 


for alla j=1,2,.. ; b ( a 
inequality if one assumes only that A is positive semidefinite 


), let x be an eigenvector otf 
, | : 
a j 


e Mh is positive definite, then a a> 
Hint: Use det A > 0. Deduce that if A e M, is positive definite, then : 


7.1 Definitions and properties 399 


7.1.6 Observation. Let A € M, be positive definite. If CeM,, m, then 
C*AC is positive semidefinite. Furthermore, rank(C*AC) = rank(C), so 
that C*AC is positive definite if and only if C has rank m. 


Proof: First note that C*AC is Hermitian. For any xe C” we have 
x*C*ACx = y*Ay = 0, where y = Cx and the inequality follows from the 
positive definiteness of A. Thus, C*AC is positive semidefinite. Further, 
note that x*C*ACx>0 if and only if Cx #0 because A is positive defi- 
nite. The statement about rank (and thus about the positive definiteness 
of C*AC) would follow if we knew that C*ACx = 0 if and only if Cx =0 
because this would mean that C*AC and C have the same null space (and 
hence they also have the same rank). If Cx =0, then obviously C*ACx = 
0. Conversely, if C*ACx =0, then x*C*ACx =0 and (using the positive 


: definiteness of A as before) we conclude that Cx = 0 O 


_ Exercise. If Ae M, is positive semidefinite and not positive definite, and 


if Ce M,,, show that C*AC is always positive semidefinite and not posi- 


_ tive definite. If CeM,, m with n#m, show by example that C*AC may 
~ be positive definite even if Ac M, is singular. 


' Exercise. Show that the cone of positive (semi)definite matrices is invari- 


ant under *congruence. See (4.5.4). 


Exercise. Let A € M, be Hermitian. Show that A is positive (semi)definite 


if and only if there is a nonsingular matrix Ce M, such that C*AC is 
pice: © positive (semi)definite. 
Corollary. The trace, the determinant, and all principal minors § 


What happens if one drops the requirement that A be Hermitian and 


ł t C C x The trace and determinant are J ust the sum and pr oduct of the d rat P 
f: r 


matrix with real entries and if xe R”, then x”Ax is real and we may still 
ask which matrices have xTAx>0 for all x #0 (even if A is not sym- 


7 metric). If A is a matrix with complex entries, or if xe C” is allowed, we 
~ might replace (7.1.1) with 


Re(x*Ax)>0 forall nonzero xeC’ (7.1.19 
Define the Hermitian part of A to be 
H(A) =4(A+A*) (7.1.7) 


When n=] this is just the real part of the complex number A. 


| Exercise. Show that (7.1.1') holds if and only if H (A) is positive definite, 


: Exercise. Show that for any AeM,, A=H(A)+S(A), where S(A) = 
4 3(A ~ A”) is the skew-Hermitian part of A. 


400 Positive definite matrices 


Problems 


1. Let A e M, be positive semidefinite and xe C”. Show that x*Ax =0if 
and only if Ax =0. Conclude that a positive semidefinite matrix Ae M, 
has rank n if and only if it is positive definite. Hint: Consider the qua- 
dratic polynomial p(t) = (x+ty)*A(xt+ty), CER. If x*Ax=0, show 
that p(t)=0 for all 7, p(0)=0 and dp/dt=0 at t= 
y*Ax =0 for all ye C” and hence that Ax =0. 


2. Show that if a positive semidefinite matrix has a zero entry on the 
main diagonal, then the entire row and column to which it belongs must - 
be zero. 


3. Show that if the main diagonal entries of a positive definite matrix are 
all +1, then all entries of the matrix are bounded by 1 in absolute value. 


Can equality occur? 


4. Show that a positive semidefinite matrix Ais of rank 1 if and only if A 
is of the form A =xx* for some nonzero vector x € C: 


5. Let 
A= [a;;] eM, 


be positive definite. Show that the matrix [@;;/(@iG@; D *1 is positive defi- 
nite, that all its main diagonal entries are +1, and that all its entries are $ 
bounded by 1 in absolute value. Such a matrix is called a correlation 
matrix. Hint: Find a congruence by a certain real diagonal matrix. 


6. IfA has real entries, show that the requirement xl Ax > 0 for all non- 
zero xe R” depends only on H(A). 


7. Show that statements analogous to (7.1.2), (7.1.3), (7.1.4), and (7.1.6) E 
Ẹ Compute this integral. 


qi 18. Use (7.1.6) to show that the matrix A 
2 is positive definite. Hint: What is this for 
C*AC, where C is the real matrix 


hold for matrices A €e M,(C) such that H(A) is positive definite. 


8. A function f:R -C is said to be a positive definite function if the 
matrix [f(x;-x,)]eM, is positive semidefinite for all choices of points 
(x X2, X n} CR and all n=1,2,.... Show that f(—x)=/(x) for all” 
xeR. Use the fact that the determinant of a positive semidefinite matriy 
is nonnegative to show that if fis a positive definite function, then 
(a) f(0)2=0, 

(b) fisa bounded function, and | f(x)| < f(0) for all xe R, 
(c) If f is continuous at 0, then it is continuous everywhere, 


ked PJ re 


A= 
A= 
n=), 


9. If fi(x), AX) -s Jax) are positive definite functions and if a), a 
...,, are nonnegative real numbers, show that the function (x)= 
ay f(x) + ++ +n fa(x) is a positive definite function. ; 


` 10. Show that the function elt 
© given = +e 
given te R. Use Problem 9 to show that IX) Sao 4 seg ota* i 
> isa 


~ 11. Prove that the function cos 


- cos(x) =(e*4e7/ 
0. Conclude that posta) = (ee) /2. 


: a : ae from Theorem 
H f(x) and g(x iti i i 

maitfa epost B a ca 
E Bethy sens ae Problem 14 that the function 1 /(1+x?) 
A and (7.0.3) with f(x) 
E o 
| 17. Show that the matrix A = 
f 1, 2, ..., nis positive definite for 


Why is C nonsingular? Notice that t 
Subtract the first row 


7.1 Definitions and properties 
401 
1s a positive definite function for each 


positive defini : : 

a lefinite function for any choice of points ¢ t,ER 

nonnegative real numbers a,,..., a Defn ER and any 
Tas az, 


(x) is a positive definite function. Hint: 


‘12. Is sin(x) a positive definite function? 


| 13. If g(x) is a nonnegati 
a egative i i 
"the function 8 and integrable function on R, show that 


f(x)= ie e'™ g(t) dt 


_ 1S a positive definite function. Hint: Use the definition 


14. Prove that the function T(x) 


tion. Hint: Let g(t)=e~' for 1> = 1/(1—ix) is a positive definite func- 


0, g(t) =0 for £ <0 in Problem 13. 
(7.5.3) [see Problem 2 in Section (7.5)] that 


| (x) |?, and 
= 1 to show that the matrix 4 = 
=1,2,...,” is positive definite for 


[a;;] € M, with a; =1/(i +j) for i FE 
alln=1,2,.... Hint: For all x = [x;]e R” 


æ n 2 
=K 
i (> Xe ‘ dt =0 


= [a;;] eM, with aj; = min{i, J} 
n= 4? Consider the congruence 


I =l ~i —] 

0 1 0 0 

C= > 
: | € My 

0 

0 o0 1 


he effect of this con i 
gruence is t 
(and column) of A from all the other rows ee 


i 7.2 Characterizations w 
402 Positive definite matrices 


the form of the lower right-hand (n—1)-by-_ ; 
C and perform a suitable congruence on it to 
Conclude that A is *congruent to I. 

argument to show that the kernel & 


N] for any N>O, that B 7.2.2 Corollary. If A € M, is positive semidefinite, then so are all the 
finite on [0, ] ‘J | powers Pe k= 1523 trs. 


_ Exercise. Let A e M, be positive semidefinite. Use (7.2.1) to show that A 
` is positive definite if and only if rank A =n. Compare with Problem 1 of 
_ Section (7.1). 


columns). Now observe 
(n—1) submatrix of C*A 
reduce it in the same way. 


19. Use Problem 18 and a limiting 
K(s, t)=min{s, t} is positive semide 
is, that f 
(7.1.8) § Proof: If the eigenvalues of A are \j,...,,, then the eigenvalues of A‘ 
are Moe A 


[* (PKs, OF) S(O) ds dt 0 | 
o JO | | 
tions f(+) on [0, N]. Hint: Ex $ 


. ex-valued func ie 
for all continuous complex n sums over partitions of [0, N] 


press the integral as the limit of Rieman 
with equally spaced points. 


4 .7,2.3 Corollary. If A = [a;] € M, is Hermitian and strictly diagonally 
~ dominant and if a;;>0 for all i=1,2,...,, then A is positive definite. 


20. Prove the identity 


7 [° mints, t} f(s) f(t) ds dt =| 
0 J0 


Ẹ Proof: This is part of Theorem (6.1.10). The conditions imply that each 
~ GerSgorin disc for A lies in the open right half-plane. Since the eigen- 
‘values of a Hermitian matrix are all real, the eigenvalues of A must all be 
positive, and hence A is positive definite by the theorem. O 


N 2 
dt 
0 


" A) ds 


continuous complex valued functions f(+) on [0, N] and use it tom 


ion i em 19. This proof gives ; Si ’ : f 
give an alternate proof of the assertion 1n Prob © Exercise. If a Hermitian matrix A is *congruent to a strictly diagonally 


pee i itive definite; that isi Axer eee Aes ; 3 Ai 

the stronger result that F Beer ve pas Express the double dominant matrix with positive diagonal entries, show that A is positive 
i in (7.1.8) if and only roa i : 

equality holds in (7.1 


definite. 
integral as an iterated integral and integrate by parts. a 


for all 


The next characterization is not very practical for computational 
_ determination of positive definiteness, but it can be of theoretical utility. 


ations of positive definit 


7.2 Characterizations 


There are several useful and simple characteriz 


7.2.4 Corollary. Let A e M, be Hermitian, and let 
matrices. u 2 
pat ta FS 


a E ana ~ be the characteristic polynomial of A. Suppose that O<ms<n and 
tive. It is positive definite i 


Qn-m #90. Then A is positive semidefinite if and only if a,#0 for all 
n-msksnand apay,,;<0 for k=n—m,...,n—1. We define a, =1. 


7.2.1 Theorem. A Hermitian matrix A € 
and only if all of its eigenvalues are nonnega 
and only if all of its eigenvalues are positive. 
n 
any nonzero x € C Wie Proof: The assertion is just that the leading coefficients a, are nonzero 
and alternate strictly in sign. If this condition is met, p,(t) cannot 
have any negative zeroes; all the eigenvalues of A must therefore be non- 
negative. Conversely, if A is positive semidefinite, denote its positive 
f : eigenvalues by )j,A2,...,Am (the remaining n—m eigenvalues are all 
: . envalues of AJ i m , : 
where D = diag (dj, d2, +> a is the a one din Obs 0). By induction, one can show that the coefficients of the polynomials 
i i . The reverse im i B(t{—di),(¢—- ~h2),--,(f- Be ee ere E n 
y= Ux, and U is unitary. Th ‘definite case is similar. O » (tdi) (t did ^2) : (t N) (E ^2) (¢— ,,) are all nonzero and 
vation (7.1.4), and the positive semide alternate in sign. Multiplying by f¢ gives pa). O 


Proof: If each eigenvalue of A is positive, then for 
have 


n n 5 
x*Ax =x*U*DUx = y"Dy = D diiy = X ail >0 
p= i= 


ive definite if and onl In order to facilitate the next characterization, we denote by A; the 


; ; ingular A € M, is posit ; AE ; ; 
Exercise. Show that a nonsing $ | leading principal submatrix of A determined by the first i rows and 


if A`! is positive definite. 


404 Positive definite matrices 


columns, A, = A({1, 2, ...,2}), 152, A. We have already noted that if 


A is positive definite, then all principal minors of A are positive, and, in 4 


fact, the converse is valid when A is Hermitian. However, an even 


stronger statement may be made. Note that if A is Hermitian, so is each E 


Aj, and therefore each A; has a real determinant. 


72.5 Theorem. If A€ M, is Hermitian, then A is positive definite if F 


and only if det A; >0 for i=1,2,...,. More generally, the positivity of 
any nested sequence of n principal minors of A (not just the leading prin- 
cipal minors) is necessary and sufficient for A to be positive definite. 


Proof: Because of (7.1.5), we know that det A;>0 for all /=1,2,...,4 
whenever A is positive definite. We use induction and the interlacing 
inequalities for a Hermitian matrix (4.3.8) to prove the converse. Since 


det A; > Oand A; is 1-by-1, A; is positive definite. If Ax is positive definite $ 


for some k <n, all the eigenvalues of Ax are positive and thus, by the 
interlacing inequalities, all the eigenvalues of A, 41 are positive except 


perhaps for the smallest eigenvalue. But the product of the eigenvalues off 


Ax +1 is just det Agi, which is assumed to be positive, so there cannot be 
just one negative eigenvalue for Ay). We conclude that even the smallest 


eigenvalue of A, 41 is positive, and hence Ag+ must be positive definite. $ 
Since A,, = A, we are done. For the case of a general nested sequence, just 


consider appropriate permutations of the rows and columns of A. O 


Theorem (7.2.5) says that a Hermitian matrix is positive definite if É 
(and only if) its leading principal minors are positive. Thus, noting 
(7.2.1), either of two sets of numbers associated with A may be checkedg” 


in order to verify positive definiteness. 


Exercise. Use (7.2.5) to show that the matrix 


5-1 3 
AS ead! ea 
3 —-2 3 


is positive definite. 
Exercise. Show that the leading principal minors of the symmetric matri 


i eh are nonnegative, but it is not positive semidefinite. 


Exercise. Let Ae M, be Hermitian and suppose that detA, >0, det A) 
0,...,detA,_;>0, and det A, = 0. Show that A is positive semidefinit 
Hint: What do the interlacing inequalities say about the eigenvalues ¢ 
A, as compared with those of Ani? 


Proof: We know that the Hermiti 
ermitian i 
pared a A=UAU* with A= diag(y, pao 
B=UA"U*, where AY = diag(\!%, ..., NI) 
_ tive kth root is taken in each Clearly B 
ro ken case. Clearly B‘ =A i iti 
pe positive semidefinite. Also, AB = AG Aes a ve 
UA AU*=UAM U*UAU* ee oe ie 
3 re A hence their kth roots) are nonnegative. The rank of B is just 
a ee of nonzero à; terms, which is also the rank of A. If A is a 
3 e si v semidefinite, then we know that U may be chosen to bea = 
E a matrix, so it is clear that B can be chosen to be real i ae 
Į ` t remains only to consider the question of uniqueness A 
| ae a that there is a polynomial P(t) such that p(A) = B; we need 
E TE cores interpolating polynomial (0 9 11) for 
: pA yorey ; = Al/*k IT 
T Up(A)U*=UA"*U*= BB )} to get p(A) = AN" and p(A) = p(UAU*) = 
| mitian matrix such that C* =A, we have B= p(A) = p(C 


7.2 Characterizations 
405 


eM, as 


bod 
2 Tot 
LA- 


for suitable values of 
t to show that, by itself, this i i 
i ! } , this is not sufficien 
ane peace Show that if, in addition, some (n—1)-by-(n on a 
p matrix is diagonally dominant, then this condition is ese 


Exercise. iti 
os ne re es eM, be Hermitian. Show that A is positive semidefinite 
wes y 1f there exist Hermitian matrices A, with A, > A as e > 0 such 
fi + 
every principal submatrix of A, has positive determinant. Canis 


that if all principal min 
; ors of A are f ; D 
definite. nonnegative, then A is positive semi- 


Ever iti 
a N ee i number has a unique positive kth root for all 
s&s.. A Similar result holds for positive definite matrices 


7.2.6 T S 
ee ee ae Ae M,, be positive semidefinite and let k = 1 be a 
. inen there exists a unique positive semidefinite Hermitian 


matrix B such that B% = A. We also have 


os See ass and there is a polynomial p(¢) such that B = D(A); 
= rank A, so B is positive definite if A is: and 
(c) Bis real if A is real. l 


an be unitarily diag- 
< Àn) and all A; > 0. We define 
and the unique nonnega- 


= BA, and B is positive semidefinite because 


=B. But then if C is any positive semidefinite Her- 
) so that 


406 Positive definite matrices 7.2 Characterizations 
407 


CB= Cp(C*) = p(c* )C = BC. Since B and C are commuting Hermitia 
matrices, they may be simultaneously unitarily diagonalized; that is 
there is some unitary matrix V and diagonal matrices A, and A; with non- =: 
negative diagonal entries such that B=VA,V* and C=VA2V". Then | 
from the fact that Bk =A=C*% we deduce that Ak = A5. But since the _ 
nonnegative kth root of a nonnegative number is unique, we conclude i 
that (A\ =A =42=(45)"* and B=C. O | 

The most useful case of the preceding theorem is for k =2. The unique 


positive (semi)definite square root of the positive (semi)definite matrix A 
is usually denoted by A’. Similarly, AVK denotes the unique positive. 


(semi)definite kth root of A for each k =1, 2,..-- 


both Q and R may be taken to b i 
e real. This establishes th i 
corollary, which gives the Cholesky decomposition of A © following 


729 Corollary. A matrix A is positive definite if and only if there 
a nonsingular lower triangular matrix Le M, with positive di 
om entries such that A= LL’. If A is real, L may be taken to be real 
na erl o ; y ge bea set of k given vectors in an inner product space V, 
and | , a given inner product on V. The Gram matrix of the 
veci in Sane ve respect to the inner product <», +) is the matrix 
3 = [Bij k defined by g;; = (w;, v). O izati i 
tive semidefinite matrices is that ORNA AES Po ‘ 


; 1/2 
Exercise. Determine 53 . 
E | 7.2.10 Theorem. Let Ge M; be the Gram matrix of the vectors 


Wis- Wz} CC” with respect to a gi i 
Iv given inner product (s, » 
W=[w, w... w]E Mn, k. Then P €? and let 


amine, 


Exercise. If A is positive definite, show that (42)! = (AT)? 


(a) G is positive semidefinite; 


7.2.7 Theorem. A matrix BeM, is positive definite if and only if there 
(b) G is nonsingular if and only if the vectors w,,..., wg are inde 


is a nonsingular matrix Ce Mn such that B= C*C. 


pendent; 
Proof: If B can be so written, then B is positive definite by (7.1.6). If Bis (c) There exists a ws , 
, wg: Ositive j 
positive definite, let C= B!/? to show that the asserted factorization can W*AW: and p definite matrix Ae M, such that G = 


d = = i 
(d) rank G =rank W = maximum number of independent vectors in 


be achieved and that C can even be taken to be Hermitian. C 
the set fwi, wens Wgl. 


7.2.8 Corollary. A Hermitian matrix A is positive definite if and only{ 


if it is *congruent to the identity. If G=[g;;] with g;;=<w;, w;), then G is Hermitian because an 


-inner product is Hermitian, and 
k k 
x*Gx= © 9X x, = 5 X ; 
p yix = (w; Ww Xx; Ww: : 
fet j na jo i> ia] CTI xm? 


k k 2 
=( X xw, È xw) = 
J i=l 


Ae >0 


Proof: This is simply a restatement of (7.2.7). 


Exercise. If AEM, is positive definite, and if A=C7C, and 4 = CzC 
with Ci, C2€ Mn, then show that C)=VC;, where V is a unitary matrix. 
In particular, show that any solution C to A= C*C is of the form C 


yA"? with V unitary. Hint: Show that 
AT C*CA T = (CA7!) (CAT P) =] 


k 
D XiWi 
i=] 


where || is the norm derived fr iven i 
om the given inner product. B iti 
s * . o 
definiteness of the norm, equality can hold only if y positive 


Sometimes it can be useful to know that the factorization A=C*Cof 
a positive semidefinite matrix can be specialized somewhat. Every square 
matrix C has a OR factorization (2.6.1) and can be written as C = QR, 
where Q is unitary and R is an upper triangular matrix with the same 
rank as C. But then A = C*C = (OR)*OR= R*Q*QR=R'*R. If C is nor 
singular, R may be chosen so that all its diagonal entries are positive (it 
fact, there is a unique factorization C = QR of this type), and if C is real, 


k 
X x;w; =0 
i=] 


and ore ta happen for a nontrivial set of coefficients x; only if the given 

, ;} are dependent. If G is singular, there is s 

-x such that Gx =0 and hence x*G » there is some nonzero vector 

. x = 0, which implies that th i 

dependent. Conversely, i at the set {w;} is 
= |x; | Æ 0, 

have shown that x*Gx = 0, so G must be singular. mi then we 


408 Positive definite matrices 


If {e),...,@n} is the standard orthonormal basis of C”, then the ma 
A = (lej, ei)) is positive definite by (a) and (b). For any vectors x, y € 


we have ; 
h pys pe 
(y, X= ( $ Yje È xe) = D lej, eX) yj =X" AY 
, jal j=l i 


i,j=l 
— (w,. w;) = w*Aw; and hence G = W*AW. . 
5o ASN oe 0, "hen eee W*AWx = (Wx)*A(Wx) =0, which 
implies that Wx =0 since A is positive definite. Conversely, Mh =0 im 
plies that Gx = W*A(Wx) =0, so G and W have the same na space and 
hence they have the same rank. The column rank of W is the n 


number of independent vectors in the set (wp. w OO 


The most common application of the theorem is to the case in 
inner product is just the usual Euclidean inner produc 
d deduce that the maximum 
fwi., Wg} C C” is exactly 


Exercise. 
which the given i 
(x, y) = y*x. Show that A= I in this case an 


number of independent vectors in a given set 
the rank of the matrix G=[w;wj]e Mp. 


7.2.11 Corollary. Let Ae M, be a given matrix. Then “a is Pose 
semidefinite with rankr <n if and only if there is a set O vee 
(Wis wees Wn} CC" containing exactly r independent vectors suet 
the Gram matrix of S with respect to the Euclidean inner pr . 


Proof: The “if” part has been treated in the theorem. For the “oniy if . 
part, use (7.2.6) to write A = B? with B positive semide nit e. e rank of 
B is the same as the rank of A and A = B°= B*B is the Gram 

the columns of B in the Euclidean inner product. O 


Problems 


: 2k.: +: ‘ 
1. Show that if A is a Hermitian matrix, then Aw" is positive seme 
for all k=1,2,... and e^ is positive definite. See the exercises following 


(5.6.15). 


2. If A is positive semidefinite, and if p(t) is any polynomial such that : 


p(t) >0 for all 20, show that p(A) is positive semidefinite. Mint What 
are the eigenvalues of p(A)? How does this generalize Problem 1? 

3. Use (7.2.5) to show that the matrix A = ļa;;] € Mp defined by 4@;; 
min{i, j} 
from all the other rows, then do t 
about qj, =max{i, j}? 


he same for the first column. What 


efinite 


is positive definite. Hint: Calculate det A;; subtract the first row. 


7.2 Characterizations 409 


4. If A and B are positive definite, show that [3 A is positive definite. 


5. Give an example of a real square (non-Hermitian) matrix whose lead- 
ing principal minors are positive but such that some eigenvalue has nega- 
tive real part. 


6. Provide the details for the general inequalities in (7.2.5). That is, show 
that the positivity of any nested sequence of n principal minors (nested 
by inclusion, not necessarily the leading principal minors) is sufficient for 
the positive definiteness of an n-by-n Hermitian matrix. 


7. What are necessary and sufficient conditions for A to be negative defi- 
nite (semidefinite) in terms of the signs of the minors? 


8. Are there “square roots” of the positive semidefinite matrix A other 
than A”? How many? Are there kth roots other than A!/*? Are there 


DP , . ~1 172 
non-Hermitian square roots? Hint: Consider [ o i] . 


9. If Be M, is positive semidefinite and has rank m, show that there exists 
an m-by-n matrix C with rank m such that B= C*C. In particular, note 
that a rank 1 positive semidefinite matrix may always be written in the 
form xx* for some xe C”, 


10. Suppose A eM, is positive semidefinite and has rank r<n. Show 
that A has an r-by-r positive definite principal submatrix. 


11. Let Ae M, be Hermitian. Show that A is positive definite if and only 
if the classical adjoint adj A is positive definite and det A >0. If Ais posi- 
tive semidefinite, show that adj A is positive semidefinite and det A = 0. 
Hint: Consider A,=A+el, €>0. Consider A= diag(0,0, —1) to show 
that one can have adj A positive semidefinite and det A > 0 without hav- 
ing A positive semidefinite. 


12. Let re (0,1) be given, and consider the real symmetric Toeplitz ma- 
trix A = [a;;] € M, defined by aj; = rl'-J|, Show that A is positive defi- 
nite as follows: (a) If Aj; is the i, j minor of A, show that det Aj; =0 
whenever |i—j|= 2. Hint: If i=1and j > 2, observe that the first column 
of A; is a multiple of the second column. (b) Let D,, = det A. Show that 
D, =1-— r° and use (a) to show that D,,,, = D, —r?D, = (1 — r?)D, = 
(1—r7)” by expanding according to cofactors of the first row. (c) Use 
(7.2.5) to conclude that A is positive definite. 


13. Show that the matrix A in Problem 12 has an inverse that is real, 
symmetric, and tridiagonal, and that (1—r?)A7~! has the entry —r in 
every position of the superdiagonal and subdiagonal and has main diag- 
onal entries 1,1+77,...,1+77,1. Hint: Use Problem 12(a) to show that 


PLL BA Tt ERLE NTL IR DEED AO e 


fini tri 7.3 Polar and singular value decomposition All 
410 Positive definite matrices 


18. Let A, Be M, be positive semidefinite and suppose A is positive defi- 
nite, Use Problem 17 to show that 


JA’? —B'? |, <A? | A— Bl. (7.2.13) 


and explain why this inequality implies that the function f:C >C'”, 
defined on the set of positive semidefinite matrices in M,, is continuous 
on the interior of this set, which is the open set of positive definite ma- 
trices. State and prove directly the inequality for the ordinary scalar 
square root function f: ¢— V? on [0, œ) that results from setting n =1 in 


A`! is tridiagonal. Why must A~'be symmetric? Now determine the ele- 
ments of A`! using AA7!=A'A=I. 


14. Let <», +) be a given inner product on c’, let G = fel, -3 enl pe the 
usual (with respect to the usual Euclidean inner product) ort onom à 
basis for C”, and let Ge M, denote the Gram matrix of ® with respect to 
the given inner product (s, +). Show that 


.2.12 
(x, yy =y*Gx (7.2.12) 


for all x, y e C”. Conclude that a function çe, °): c” xc” > Cis an inae 
product if and only if there is a positive definite matrix G such tha 


(7.2.12) holds. 


15. Recall the notion of a dual norm defined in (5.4.12). on ay pea 
given inner product on C” and let |°] be a given norm on C”. oeg ve 
norm is not necessarily induced by the given inner product. me 
define the dual norm of || with respect to the inner product <e, 


7.3 The polar form and the singular value decomposition 


We next develop two important related factorizations of complex matrices 
not necessarily square) which depend heavily on positive definiteness. 


7.3.1 Lemma. Let AeM,,, with msn and rank A=ks m. There 
xists a unitary matrix X €e Mm, a diagonal matrix Ac M,, with nonnega- 
ive diagonal entries \; = 22> --- 2 k> Akap =e = Àp = 0, and a matrix 
YeM,,, with orthonormal rows such that A= XAY. The matrix A= 
iag(\},---, Am) is always uniquely determined and {Ai eee A2) are the 
igenvalues of AA*. The columns of the matrix X are eigenvectors of 
AA*. If AA* has distinct eigen “Ytues, then X is determined up to a 
ight diagonal factor D = diag(e™!, ..., eff”) with all 6; e R; that is, if A = 
X,AY,;=X2,AY2, then X,=X,D. Given X, the matrix Y is uniquely 
etermined if rank A = m. If A is real, then X and Y may be taken to be 
eal. 


Ix] = max |x, y)| 
|yj=1 


Notice that this is the usual dual of lel if <*, +) is the Euclidean int 
product. Does this extension of the notion of a dual norm pro u oe 
vector norms that we have not already generated by other means: 

Use Problem 14 to write <x, y)=y*Gx, and show that 


jxt2.,=1G xP = (ele | 
16. Let A€ M, be given. Show that p(A) <1if and only if there enst 
positive definite matrix BeMn such that B—A*BA is positive defini 
Hint: If B is positive definite, let C=B"*. If 

B—A*BA = C*C—(CA)*(CA) 
is positive definite, then for any nonzero x€ C” we have 

x*[C*C —(CA)*(CA) |x > 0 


or |Cx|2>| CAx|2. Let y = Cx to show that lylo> Icac hie ay 
zero y € C” and conclude that \|CAC~ |], <1. Thus, p(A)= ane Oe s 
| CAC ~', <1. Conversely, if p(A) <1 there exists a nonsingn aceh 
such that ||CAC™'|]2<1 [see Section (5.6), Problem 25] an 
argument can be reversed with B=C*C. 


roof: If A= XAY is a factorization of the asserted form, then AA*= 
XAYY*AX* = XAIAX*=XA?X*, so XA°X* is a unitary diagonali- 
ation of the Hermitian matrix AA*. If X =[x)x2...Xm] and if A’ = 
jag(Qj, ..., Mm), then AA*x; =5x;, j=1,2,..., m, and the vectors {x;} 
re orthonormal. Because the diagonal entries of A are to be nonnegative 
nd are to be arranged in nonincreasing order, A is uniquely determined 
y AA*. If the numbers {)7} are distinct, the corresponding normalized 
igenvectors of AA* are each determined up to a complex scalar factor of 
odulus 1, so if X, and X; are unitary matrices whose columns are eigen- 
ectors of AA*, we must have X = X, D with D = diag(d),...,d,) and all 
d;|=1. 

Eigenvectors of AA* corresponding to a multiple eigenvalue are not 
niquely determined, however, but once they are chosen and ortho- 
ormalized so that the unitary matrix X is fixed, then Y=A7!'X*A is 


17. Let A, Be M, be positive semidefinite and not both singur Shor 
that ||A-—Bll2= |4? — B? ||2/[^min(4) + )min(B)]. Hint: en) = i p 
and let xe C” be a unit vector such that Ex=)x and |\|=p( =E 
Then A?— B? = AE+EA-E’ and |4- B? = |x*(AE+ EA-E )x|= 
In| (ct Ax +x*BxX) = |A] Amin (4) + Xmin(B))- 


412 Positive definite matrices 


uniquely determined if A is nonsingular, which is the case i k= rank A =E 
m. One checks easily that YY* = AIXHAA* XJA! = A X* XAA =| 


ATAA! =/, so this matrix Y has orthonormal rows. 

It remains only to handle the case in which rank A =k <m. Since we 
want Y=A~'X*A =A~!(A*X)* when all \; #0, we are led to define the 
jth row of Y to be the row vector y7, where y; = hj (A*x;), Peek 
Then 


[Aj A XI AE A X] =X AA X/A Nk = APNE A A= XAN 


which is 0 if j # k and is 1 if j =k since the vectors {x;} are orthonormal, 


The vectors {,..., Yk] are an orthonormal set in C”, and n=m>k, sof 


there exist m— k additional (but not uniquely determined) orthonormal 
vectOrs Yx41,-+-+;¥m Such that the matrix Y*=[y, yo... Yk Ykyi--. Ym]E 
Mn,m has m orthonormal columns. 


Now notice that X*A = AY. The first k rows of both sides of this iden- | , 


tity are equal by construction of the vectors y;. The last m—k rows 
are all 0 on the right because the last m—k diagonal entries of A are 0; 
the last m—k rows are all 0 on the left because if AA*x;=0, then 0= 
xj AA*x; = (A*x;)*(A*x;) =0 and hence A*x;=0. . 

Finally, if A is real, then AA* is real and has real eigenvalues, and 
hence the eigenvectors X may be taken to be real. The first k rows of Y, 


which are determined by X, are real by construction, and the m—k & 


orthonormal vectors that are added may be taken to be real. Thus, all the 
factors may be taken to be real if A is real. O 


Every nonzero complex number z has a unique “polar representation” $ 
z = pu, where p is a positive real number and u is a complex number of F 
modulus 1. Indeed, p= |z| and u=p7'z=2/|z| if z¥0. If z=0, thenz F 


can still be written in polar form with p =0, but u is no longer uniquely 
determined. Indeed, u can be any complex number of modulus 1. 


How does this generalize to a complex matrix A € M,? One answer is i 
that A = PU where P is positive (semi)definite and U is unitary. We can} 


even generalize to the case in which A is not a square matrix. 


Tda 
A=PU 


where PeM,, is positive semidefinite, rank P=rank A, and U EM ma 


has orthonormal rows (that is, UU* = I). The matrix P is always uniquely f 


> n : a 
determined as P =(AA*)'/?, and U is uniquely determined when A hash Exercise. Let x € C” be a given nonzero vector and let A =x € Mp, 1 Show 


rank m. If A is real, then both P and U may be taken to be real. 


È 73.3 


7.3 Polar and singular value decomposition 413 


r - Proof: Use (7.3.1) to write A= XA Y =XAX*XY and set P= XA X* and 


U= XY. Then P is positive semidefinite, and UU* = X YY*X* = XIX*= 
XX*=]I, so U has orthonormal rows. By the construction in (7.3. 1), 
_P=(4A*)'?, and in general if A = PU, then AA* = PUU*P = P?, so P 


| must always be the (unique) positive semidefinite square root of AA*. If 


A has rank m, then P is nonsingular, and U= PTA is uniquely deter- 


_ mined. As we saw in (7.3.1), however, if rank A < m, then the rows of Y 


corresponding to the 0 eigenvalues of P are not uniquely determined, so 


_ U=XY need not be uniquely determined when rankA<m. O 


An important special case follows immediately. 


Corollary. If A e M,,, then it may be written in the form 
A=PU 


where P is positive semidefinite and U is unitary. The matrix P is always 
uniquely determined as P=(AA*)!/?. if A is nonsingular, then U is 
uniquely determined as U=P7'A, If A is real, then P and U may be 


d -taken to be real. 


3 Exercise. Show that Theorem (7.3.2) may be proved using the following 


limit argument. If A is nonsingular, then set P=(AA*)'/?, define U= 


PA, and check that UU* =]. Thus, both P and U are uniquely deter- 


mined. If A is singular, consider A,=A+el, €>0, and form A,=P.U, 


` where both factors are uniquely determined. Use the selection principle 


(2.1.8) to obtain a sequence eg +0 as k > œ such that U,, is entry-wise 


_ convergent to a unitary matrix U as k > 0, Since Pe = Ag, UE, we also 


have Pa, > P and A = PU. Notice that this argument, while conceptually 
more economical than that given for (7.3.2) above, does not give a con- 
structive procedure for obtaining the factors P and U when A is singular. 


The factorization (7.3.2) is known as the polar form or polar decom- 
position of the matrix A. We note that both factors are unique if A has 
full rank. 


Theorem. Let A € Mm n with m<n. Then A may be written as | y Exercise. If A€ Mm „n and m=n, show that it may be written as 


A=WQ 


4 where WEM,, n has orthonormal columns (that is, W*W =I) and Qe 


M, is positive semidefinite. Hint: Factorize A* by (7.3.2). 


that the polar decomposition of A is A=x=|x|zu, where u=x/|x|>. 


414 Positive definite matrices 7.3 Polar and singular value decomposition 415 


determined. If n =m, the uniqueness of V and W is determined by con- 
sidering A*. If A is real, then V, E, and W may all be taken to be real. 


Thus, the polar decomposition may be thought of as a generalization to : 
matrices of the convenient factorization x =|x]2(x/|x|2) of nonzero | 
vectors. 
Proof: We assume without loss of generality that mn (otherwise, 
replace A by A*). Use (7.3.1) to write A= XAY with X,AeM,, and 
YeMn,n. Set V=X, take L=[A\O]eM,, n and define W=[Y*'S*]e 
M, by requiring that the columns of W be an orthonormal set in C”. The 
columns of Y* are already orthonormal, so if m<n, the columns of 
S*e M,,(n—m) May be chosen (but not uniquely) to make W be unitary. It 
is immediate that VUW*= XAY=A. The statements about uniqueness 
follow from the corresponding assertions in (7.3.1). DO 


Exercise. Show that a square matrix A may be written both as A = PU 
and as A = WO, where P=(AA*)"”? and Q=(A*A)'””. These are some 
times called “left” and “right” polar decompositions of A. Show that th 
uniquely determined positive semidefinite factors P and Q are equal i 
and only if A is normal. It is a fact that if A is nonsingular, then th 
uniquely determined unitary factors U and W are always equal [exercis 


preceding Theorem (7.3.6)]. 


Exercise. Not every square matrix is normal; that is, it is not always tru 
that AA*= A*A. But AA®* is always unitarily similar to A*A. Use th 
polar decomposition (7.3.3) to prove this. 


The “diagonal entries” o; = op, i=1,...,g=min{m, n} of E are known 
as the singular values of AE M,, , (sometimes only the nonzero ones are 
so termed), and the columns of V and the columns of W are the (respec- 
tively, left and right) singular vectors of A. The factorization (7.3.5) is 
known as the singular value decomposition of A. The polar matrix P is the 
unique positive semidefinite square root of AA*, and the singular values 
g; are the nonnegative square roots of the eigenvalues of AA*, so the 
singular values of A are the same as the eigenvalues of the polar matrix 
P. While it is convenient to arrange the singular values in decreasing 
order, this is not a universal convention in the singular value decomposi- 
tion; it is the set of singular values that is uniquely determined by A. 

Notice that the singular value decomposition is a natural generaliza- 
tion to arbitrary matrices of the unitary diagonalization of normal 
matrices. For this reason, it is often the case that facts about eigenvalues 
of normal matrices generalize to statements about singular values of 
general matrices. 


7.3.4 Theorem. Let A € M,, and let A = PU bea polar decomposition 
Then A is normal if and only if PU= UP. 


Proof: If P and U commute, then AA* = PUU*P* = PP = P? and A*A = 
U*P*PU = U*P?°U = U*UP? = P?, and so A is normal. If A is norma 
then P?=U*P2U. Observe that P? and U*P?U are both positive semi 
definite square matrices with obvious respective positive semidefinit 
square roots P and U*PU. But Theorem (7.2.6) says that such a squar 
root is unique, so P= U*PU, or UP = PU. E 


Our next goal is to deduce the singular value decomposition of a 
arbitrary (not necessarily square) matrix from (7.3.1). 


panan 


Exercise. Let xe C” be a given nonzero vector and let A =x € M,, 1. Show 
that a singular value decomposition of A is A=x=VEW*, where W= 
[eM E=[lxlz,0,...,0] € Mn, and V=[v...u,]E M, has v= 


7.3.5 Theorem. If A€ Mm,n has rank k, then it may be written in th 
form 


A=W x/|x|z} and vz, ..., 0, are n—1 arbitrary orthonormal vectors that are 
where Ve Mp and W eM, are unitary. The matrix E = [o; jlEMm,n D orthogonal to x. 
Ojj = 0 for all i#Æj, and Oy tone: = Okk > Ok+1,k+1 = °° = Oqqa = 0 


If Ae M,, the three factors V, E, and W in the singular value decom- 
position are all n-by-n matrices. If A= PU is a polar decomposition of 
A, and if P= VAV* is a unitary diagonalization of P in which the (neces- 
sarily nonnegative) eigenvalues of P are arranged in nonincreasing order, 
then A= PU=VAV*U=(V)(A)(V*U) = VAW* is a singular value de- 
composition of A with V=V, E =A, and W = U*V. Notice that AA*= 
VEW*WEV*=VE?V*, so that the columns of V are eigenvectors of 
the Hermitian matrix AA* with corresponding eigenvalues o/,..., 07. 


where q = min{m, n}. The numbers {o;;} = {0;} are the nonnegative squar 
roots of the eigenvalues of AA*, and hence are uniquely determined. Th 
columns of V are eigenvectors of AA* and the columns of W are eigen 
vectors of A*A (arranged in the same order as the corresponding eige 
values o7). If m <n and if AA* has distinct eigenvalues, then V is dete 
mined up to a right diagonal factor D = diag(e’!,...,e/) with all 6; 
R; that is, if A= VEW = V,UW3, then V, = V, D. If m< n, then W 

never uniquely determined; if n = m = k and V is given, then W is uniquely 


7.3 Polar and singular value decomposition 417 


416 Positive definite matrices 


Similarly, A*A=WEV*VEIW*=WL?W*, so the columns of W are 
eigenvectors of A*A. 


If Ae M, is normal, and if A = VEW* is a singular value decomposi- 
tion, then AA* = A*A, and so AA* and A*A have the same eigenvectors. 
It does not follow from this that V = W in a singular value decomposition 
of A, however, for then A =V £V* would necessarily be Hermitian (even 
positive semidefinite). If A = UAU* is a unitary diagonalization of A and 
f A=diag(\,,...,,), then each 4, =|A,|e« for some 6, ER; if \, =0, 
choose 6, = 0. If we set D = diag(e™!,...,e/") and |A| = diag(|A,|,..., 
dnl), then A = |A|D and A = UAU* = U|A|DU* = (U)(|A|)(UD)* = 
ViW* is a singular value decomposition of A with V=U, E =|A|, and 
W=UD. 

Thus, the singular values of a normal matrix are just the absolute val- 
ues of the eigenvalues, the columns of V are eigenvectors of A, and the 
olumns of W may be taken to be the same as the columns of V except 
hat each is multiplied by a complex scalar of absolute value 1, which 
s determined by the corresponding eigenvalue. If A is Hermitian, then 
all the eigenvalues are real, D = D, and D = diag(sgn(),),...,Sgn(A,)), 
where we set sgn(0) =1. If A is Hermitian and positive semidefinite, then 
D=I,V=W=U, and A=2. 

One useful application of the Schur triangularization theorem (2.3.1) 
as to show that every square complex matrix is the limit of matrices 
ith distinct eigenvalues. The singular value decomposition can be used 
o show that every complex matrix (square or not) is the limit of matrices 
with distinct singular values. This can be useful because of the partial 
uniqueness of the singular value decomposition in the case of distinct sin- 
gular values. 


Exercise. 1f Ae M, is nonsingular, show that the following procedure 
yields a singular value decomposition A =VUW*: 


(a) Form the positive definite Hermitian matrix 4A* and compute a 
unitary diagonalization AA*=UAU* by finding the (positive) 
eigenvalues {\;} of AA* and a corresponding set {u;} of normal- 
ized eigenvectors. 

(b) Set E =A! and V=U=[uy... up). 

(c) Set W=A*VET, 


Show that W is unitary and A = VEW*. Hint: Compute W*W. 


Exercise. If A € M, is given (not necessarily nonsingular), show that the: 
following procedure yields a singular value decomposition A = VLIW": | 


(a) There exists some c=c(A)>0 such that A,=A+e/ is nonsin- 
gular for all positive e<c. Let O0<e<c. 
(b) Use the procedure in the previous exercise to form a singular 
value decomposition A; = VE, W. 
(c) Use the selection principle (2.1.8) and let «0 through a se- 
quence of values ex such that 


lim V,=V and lm W,=W 
cep 70 e470 
both exist. 


(d) Show that A = VEW*, in which E =lim,..o Le. 
7.3.6 Corollary. If A € Mm, n is given, and if || is a given norm on 


~Mm,n, then for every e >0 there exists a matrix A, € Mm,n with distinct 
singular values such that |A—A,|<e. 


This argument, which can be used to prove the general singular val 
decomposition (7.3.5), guarantees that a singular value decompositi 
exists in general but does not give a constructive procedure for com- 
puting the factors in the singular value decomposition when A does not: 
have full rank. 


Proof: Suppose m <n. Let A = VE W* be a singular value decomposition 
of A, and let 

Exercise. Suppose A €M, is nonsingular and that A= PU, A= WỌ: E, = [diag(o, +5, 07 +28, ..., oy + mò) '0] 
are the left and right polar decompositions of A with positive defin 

P,Q eM, and unitary U, We M,,. Show that U = W always, but P=Q 
and only if A is normal. If A is singular, show that there exist left a 
right polar decompositions of A for which U < W. Hint: If A = VEW* 
a singular value decomposition of A, then neither V nor W is unique 
determined but A = (VW*)(WIW*) =(VLV*)(VW*); use the uniqu 
ness part of (7.3.3). Consider A =0 to show that the unitary factors 
the two polar decompositions of A need not be the same if A is singul 


with 0€ My, n—m- If all the singular values of A are equal, £; will have 
distinct diagonal entries for all ô> 0. If not, and if ô> 0 is chosen so that 
mô is less than the smallest difference between successive distinct singu- 
lar values, then £; will have distinct diagonal entries. In either event, 
2 >E as 6-0. If we set A=V E; W*, then |A—A;|,=|(Z—L;|, 0 
as 6 + 0 since the Frobenius norm is unitarily invariant. But all norms on 
Mm » are equivalent, so we are done. The argument is similarifm>n. © 


418 Positive definite matrices 


7.3 Polar and singular value decomposition 419 


There is a simple transformation that permits one to convert results . 
about eigenvalues of Hermitian matrices into results about singular — 
values of arbitrary matrices. 


_ they should be compared with (6.3.2) and (6.3.4) and the discussion of 
the condition number between. For a generalization of these results to 
_ arbitrary unitarily invariant norms see (7.4.51). 


7.3.7 Theorem. Let A € Mp, n, let g=min{m, n}, and define A € Mman 


by 

~ 0 A 

A= 7.3.7a) © 
| i 0 | (7.3.74). 
Let 01, 02,...,% be nonnegative real numbers. The singular values of A | 
are gi, 02,.-. o; if and only if the m+n eigenvalues of A are 0, 02, +. 
Og) —O1, —925+++, —Og, and |m—n| additional 0’s. 


l 7.3.8 Corollary. Let A, B e Mm, n, let E= B—A, and let q=min{m,n}. 
_ Ifo; 2022 ++» 2o, are the singular values of A and T) 2722 +++ 27g are 
the singular values of B, then 


(a) |o;—7;|<||E |). for all i=1,2,...,g; and 
(b) [Ef (r) < Eh. 


Proof: These two results are analogs of Weyl’s inequality [(4.3.1); see 
also the exercise before (6.3.5)] and the Hoffman-Wielandt theorem for 
Hermitian matrices (6.3.8). They follow immediately from the stated 


Proof: Suppose mzn and let A=V2LW* be a singular value decompo 
results and (7.3.7). O 


sition of A. Write 


0 Exercise. Provide the details for the proof of (7.3.8). For (a), see Prob- 
lem 36 of Section (5.6). 

ne S =diag(o1, 02,...,¢,), and write the unitary factor VeM,, a 
= [V| V2], where V, € My, , and Vz E€ Mm, (m-n). If we set V=V,/V2 an 


Aa W/V2, then the matrix 
V-V y. 
U = | ; 


S 
z=] |E Mnn OeMm-n,n 


There is also an interlacing property for singular values; it follows 
from the interlacing property of eigenvalues of Hermitian matrices. 


l E Mm+n OE My, m-n 7.3.9 Theorem. Let A € M,,,,, be a given matrix and let A be the matrix 


W W o0 
obtained by deleting any one column from A. Let {o;} denote the singul 
. . . . i gular 
is unitary and one can verify by a direct calculation that values of A and let {6;} denote the singular values of A, both arranged in 
Ss 00 nonincreasing order. 
A=U!|0 -S 0|U* (a) If mzn, then 
0 0 0 


0, 26,;20226,2:°-:: 26, -,; 20,20 


where the diagonal zero is an (m—n)-by-(m—n) matrix. The argumen 
is similar if m<n. O 


(b) If m<n, then 


0, 26) 2022622 ++: Oy 26,20 
Exercise. Let A€ Mm,n be given. Show that the singular values of A* 
A’, and A are the same as those of A. If Ve Mm and Ve M, are unitary 
show that the singular values of UAV are the same as those of A. If ceC 
show that the singular values of cA are |c| times the singular values of A 


If a row of A is deleted instead of a column, the appropriate inequalities 
are obtained by interchanging m and n in (a) and (b). 


Proof: The squares of the singular values of A are the eigenvalues of the 
Hermitian matrix A*A EMn, and the squares of the singular values of A 
are the eigenvalues of A*A e M,,-1, which is a principal submatrix of A*A 
f a column of A is deleted. The interlacing inequalities follow directly 
from the inclusion principle (4.3.15). If a row of A is deleted instead of a 
column, consider AA* and AA* instead. O 


As an immediate application of Theorem (7.3.7) we have perturbatio 
results for singular values of arbitrary matrices that follow from the cor 
responding results for Hermitian matrices. They show that every matri 
is perfectly conditioned with respect to singular value computations 


420 Positive definite matrices 


. . ‘tian 
As a final similarity between properties of eigenvalues of e og 
matrices and properties of singular values, we have the following 


of the Courant-Fischer theorem (4.2.1 1). 


7.3.10 Theorem. Let A€ Mm,n, let q =min{m, n}, let a = n> = =o 
be the ordered singular values of A, and let k be a given integ 


1<k-<q. Then 
. JAx|2 _ 
i een ryo xech [xl 
poSo AT X LW ees Wend 
and 
. |Axh _ 
max min = OK 
w Wp- kEC” x#0,xEC, |x| 
P ” X LW, Wy k 


i ‘ately from (4.2.12) and (4.2.13), - 
Proof: These formulae follow immediately ( sare the ordered 


. * 2 = *À , 3 
eigenvalues of the Hermitian matrix A*A, then of (A) =)n-k+1(4*A) 


< 


. 
oe 


since o2(A) is an eigenvalue of A*A. If M SM S 


and (4.2.12) says that 


x*A*Ax 
*A)= min max ; 
of (A) = An—K41(A*A) oa oy xx 
Whee WEE 
XL Wyy ees Wet 
2 
(4a) 
= min max — 
Wyre We pEC™ x#0,xEC" |x[2 
XL Wye Weed 


The second identity is proved in the same way. g 


Problems a 
1. Let Pe M, be positive semidefinite. Show that P can be written 
polynomial in P 


commute with P. 
factors P and U commute. 


2. Show that any A €M, can be 
P is positive semidefinite, and H i 
to be positive definite. To what extent are : 
Hint: lf U e Mn is unitary, and if U =VAV 
of U, then A=e’?, where D is a diagonal 
entries. What is e'”?""? 


P and H determined by 


è 2: 
so if a given matrix U commutes with P*, it must a5 
‘Use this to show that if A e M, 1s normal, then its pola 


written as A = Pe”, where P, H €M, 
s Hermitian. Show that H can be rako 
is a unitary diagonalizatio 
matrix with real main diagon 


7.3 Polar and singular value decomposition 421 


3. Show that A e M, has a zero singular value if and only if it has a zero 
eigenvalue. 


4. Let AeM,,,, and let g=min{m,n}. Show that the largest singular 
value of A is equal to the spectral norm of A. Show that the Frobenius 
norm of A satisfies the identity 


lAl = ( x A 


Show that o; < |A| = vno, and identify the cases of equality. Conclude 
that 


lAl2<]Aļ2 < v7ļAļą forall AeM, 
Show that these bounds are sharp by considering J and [ J o]: 


(7.3.11) 


5. If k<min{m, n} and v is the kth column of V and w, is the kth col- 
umn of W in a singular value decomposition (7.3.5) of A, show that 


A*U, = Ok Wk and AWg = Gk Uy 


where ø% is the kth singular value of A. In particular, vg Aw, = ox. 


6. If one is given a large matrix A, how does one go about computing 
the rank of A numerically? Notice that the rank of A is equal to the num- 
ber of nonzero singular values of A, so one way to compute the rank of A 
numerically is to normalize A to A;=A/|A | for some conveniently cal- 
culated norm ||, then compute a singular value decomposition for Ay 
and take the rank of A to be the number of singular values of A, that are 
larger than some threshold. Why would you expect the numerical deter- 
mination of the rank of A to be easier and more accurate if the ratio of 
the smallest to largest nonzero singular values of A is not near 0? 


7. Let AEM, , have singular value decomposition A =V E W* and de- 
fine A'=WE'V*, where Et is the transpose of E in which the positive 
singular values of A are replaced by their reciprocals. Show that 

(a) AA‘ and AtA are Hermitian; 

(b) AAA =A; and 

(c) AMA = Aft, 

Show that At = A~! if A is square and nonsingular. The matrix A’ is called 
the Moore-Penrose generalized inverse of A. It exists for any matrix A, 
even for a singular square A and for a nonsquare A. Show further that 4! 
is uniquely determined by the above requirements (a)-(c). 


8. A least-squares solution to the linear equations Ax = b is a vector x 
such that |x|. is minimized among all vectors x for which | Ax—b]|, is 
‘minimal. Show that x = A'b is a least-squares solution to Ax =b. 


422 Positive definite matrices 


9. Show that A‘ = lim; „o A*(AA*+t/)7!, where AT is defined in Prob- : 
lem 7. 


7.3 Polar and singular value decomposition 423 


14, Let Ae M,„. Show that A is diagonalizable if and only if there is a 
positive definite Hermitian matrix P such that P~'AP is normal. Hint: If 


_ acm! 
10. The singular value decomposition (7.3.5) can be derived without ex A=SAS™, apply the polar decomposition (7.3.3) to S. 


plicit use of eigenvectors and eigenvalues. The (left and right) singular 
vectors and the singular values can be constructed directly from the vari 
ational characterization of the spectral norm. Consider A € M, and th 
variational characterization (++) |All = max{|Ax|2: |x|. =}. (a) Le 
n=2and let Be M, have the special form 


d1 w* 
B= 
lo, 
where g; = ||B|]2, we C”~!, and X € M,,_;. Show that w =0. Hint: If o; > 
0, consider ¢ = [a] / (0? + w*w)'/, show that |BS|3 = of + w*w, and us 
(++). (b) Let A € Mn, let o, = || A ||2, and use (**) to show that there is som 
unit vector x, such that |Ax,|2= 0. Let yı =o, ‘Ax. (c) Let WV, EM, 


be unitary matrices whose first columns are x, and y;, respectively. Show 
that V#AW, has spectral norm g; and has the form of the matrix in (a) 
Conclude that VAW, = [g ME (d) Formulate an induction procedur 
for deflating A by introducing additional columns and rows of off-diag 
onal zeroes by pre- and postmultiplication by unitary matrices, and ob 
tain the singular value decomposition of A. (e) What if A € Mm,n is no 
square? 


15. Use the singular value decomposition (7.3.5) (especially the state- 
ments about uniqueness of the decomposition) and Corollary (7.3.6) to 
prove Takagi’s representation (4.4.4) for a complex symmetric matrix. 
Hint: If A=A" eM, has distinct singular values and if A=VEW*, 
then A= A’ = wEV”. But then there exists a diagonal unitary matrix 
D=diag(e!,...,e/") such that W=VD, so A= VIW*=VX(VD)*= 
VEDV'=(VD'?)S(VD'?)’ =ULU". For the general case, use (7.3.6) 
and the selection principle (2.1.8) to perturb and pass to the limit. 


16. Let A,BeM,,,, let g=min{m, n}, let the ordered singular values 
of A be 0\(A)2 ‘++ 20,(A)20, and similarly for B and A+B. Let 
A,B, A+BeéMy, be the Hermitian matrices defined as in (7.3.7a). 
Show that 0,(A) =\Am4n—x41(A) for kK=1,2,...,q and similarly for B 
and A+B. Be careful: The singular values are arranged in decreasing 
order and the eigenvalues of the Hermitian matrix A are arranged in 
increasing order. Use this identity and Weyl’s theorem (4.3.7) to show 
that 


0;4;-1(A+B) S0;(A)+0,(B), l<si,jsq and it+jsqtl 


In particular, o1 (A+B) s 6;(A)+0;(B) (why is this not surprising?), and 
0,(A+B) s min{o,(A)+0;(B), 0;(A)+o,(B)}. 


11. Let A=VEW* bea singular value decomposition of the matrix A 
Mm,n» Suppose A has rank k, and let q =min{m, n}. Show that the las 
n—k columns of W form an orthonormal basis for the null space of 
and that the first k columns of V form an orthonormal basis for the rang 
of A. 


12. Let AeM,,,, and Be M,, n. Show that an orthonormal basis for th 
intersection of the null spaces of A and B is given by the last several (ho 
many?) columns of W, where VEW” is a singular value decompositio 
of the partitioned matrix [3] €Momn+p),n- Hint: When is [|x = 0 fo 
xeC”? How can you find an orthonormal basis for the intersection of | 
the null spaces of k matrices 41, A>,...,A,, each having the same num 
ber of columns? 


17. Consider A= lo o] and B= È | to show that the inequality 
6;(A+B) s0;(A)+0;(B) is not true for all (=1,2,..., where {o (A) 
and {o;(B)} are the singular values of A and B, respectively, both ar- 
ranged in decreasing order. 


18. Let A, Be Mm,n be given, let g=min{m, n}, let the ordered singular 
values of A be o;(A) = --- = 0,(A) = Oand similarly for Band AB* € Mm. 
Show that - 


53+ ;-1(AB*) <0;(A)o,(B), Isi, jsq, i+jsqtl 


These inequalities may be thought of as a multiplicative analog of the 
additive inequalities in Problem 16 as well as a generalization of the sub- 
multiplicative property of the spectral norm when m =n. Why? Hint: Let 

*= WO be a left polar decomposition of AB*, with unitary We M, 
and positive semidefinite Q € Mm. Show that (x*Qx)? = (x*W*AB*x)* = 
[(A*Wx)*(B*x) }? < |A*Wx|§ |B*x|5 = [(Wx)*AA*(Wx)](x*BB*x) for 
any xe C”, Let z,...,2;-; be orthonormal eigenvectors of AA* corre- 


13. Show that the polar decomposition (7.3.2) and the singular value de 
composition (7.3.5) are equivalent in the sense that each is easily derive 
from the other. Hint: Apply the spectral theorem to P. 


424 Positive definite matrices 


7.3 Polar and singular value decomposition 425 


- matrix is well conditioned with respect to singular value computations, 
© whereas a given matrix may be poorly conditioned with respect to eigen- 
- value computations. 


sponding to the i—1 largest eigenvalues o7(A),...,07_,(A) of AA", let 
Vy +++) V1 be orthonormal eigenvectors of BB* corresponding to th 
J1 largest eigenvalues of(B),... 07 ı(B) of BB*, and let x, = W*z, 
X= WZ, 001, Xj = WZ 1 Xi = Yp Xit =Y Xij -2= -1 If x 
orthogonal to x, for k=1,2,. rr ee 2, then both (Wx)*AA*(Wx) 
of (A) |x|} and x*BB*x < oj (B jix, so under these constraints we hav 
(x*Ox)? <o? (4)o? (B) |x [3. Now invoke the Courant-Fischer theore 
(4.2.11) to conclude that 


074. -1(AB*) = (An—1—j+2([(AB*)*(AB*)]'7))? < 07 (A) 07(B) 
19. Although the eigenvalues of AB and BA are always the same i 


A,BeéeM,, consider the examples lo | and lo | to show that th 


singular values of AB and BA need not be the same. Show, however, tha 
the singular values of AB and B*A* are always the same. 


22. Let A=[a;;]eM, be given. Show that if A has a “small” row or 
column, then it must also have a “small” singular value. More precisely, 
tet A=[ri ri.. tal’, where r;e C” and r/ is the ith row of A. Place the 
. set of Euclidean norms of the rows {|rijo:i=1,...,} in increasing 
order and denote the resulting ordered values by Ris Rys <R,. 
“Show that 


k k 
DY ofi SR? for k=1,2,...,n 
j=] j= 
with a similar upper bound involving the norms of the columns. Recall 
that the singular values are ordered with o, <0,_,< ++: <0). Hint: The 
_ squared singular values are the eigenvalues of the Hermitian matrix 
` AA*. What are the main diagonal entries of AA*? Use majorization and 
- Theorem (4.3.26). Consider A*A for the column sum inequalities. Com- 
- pare with Problem 19 of Section (4.3). 


20. Let X be an n-dimensional random vector whose component 
have zero means and finite variances. Let E = Cov(X)=E(XX*) [se 
(4.5.3*)], assume that E is nonsingular, let P= Z"? and let A, BeM,b 
given. The random vectors AX and BX have the same (zero) mea 
vectors, but there is no reason to expect that they have the same covar 
iance matrices. Show that Cov(AX)=Cov(BX) if and only if A= 
B(PUP `!) for some unitary matrix Ue M,,. Hint: If AL A* = BIB", the 
(AP)(AP)*=(BP)(BP)*. If RW is a polar decomposition of BP, sho 
that RV is a polar decomposition of AP for some unitary W, Ve M, 
What is R? Conclude that A= B(PW*VP~')=B(PUP™'). To wha 
extent is U determined? What if E =J? What if B=J? 


23. There is a natural analog of the singular value decomposition in 
‘which the unitary factors are replaced by complex orthogonal factors. 
Unlike the singular value decomposition, however, this factorization 
cannot always be achieved; recall from Problem 7 of Section (2.3) that 
the orthogonal analog of the Schur unitary upper triangular factoriza- 
tion cannot always be achieved either. If A € M,, ,, can be written in the 
form A = PAQ", where P € Mm and Qe M, are complex orthogonal and 
A= Pu] € Mm,n is “diagonal” in the sense that à =O if i4/, show that 
AA eMn is diagonalizable and rank A=rank AA’. These two condi- 


21. Consider the matrix A, € M, given by 


0 1l 0 tions are also Sufficient to ensure the existence of the indicated factoriza- 

A = : . e>0 tion A= PAQ”. What does this say if A is real? Give an example of an 

‘ 0 1 |? AeéM), that cannot be written as A= PAQ” with complex orthogonal 
ce. 0 0 P, Q € M; and diagonal A e M3. 


24. Explain why the singular value decomposition may be thought of as 


Show that the characteristic polynomial of A, is t”—e. Hint: Compute ain 
a generalization of the spectral theorem for normal matrices. 


det(t]—A,) by a Laplace cofactor expansion along the first column. 
Show that the eigenvalues of A, are the n choices of </e. Show that the 
singular values of A, are 1, with multiplicity n —1, and e. Now let n= 10, 
e =107! and observe that the perturbation Ap > A, results ina .1 pertur- 
bation of the eigenvalues of Ag, but only a 107° perturbation of any 
singular value of Ao. What is the spectral condition number of A,? This 
is an example of the assertion following Theorem (7.3.7) that every 


25. Theorem (2.5.5) on simultaneous unitary diagonalization of a fam- 
ily of normal matrices has an analog for the singular value decompo- 
sition. Let F={A;:ie€ 9} cC] m,n and suppose there are unitary matrices 
VeM,, and W eM, such that every V*A, W is “diagonal” in the sense of 
Problem 23; that is, its i, j entry is 0 if i+. Show that (a) Each AFA, € 
M, is normal and G={A; A i ES} CM, is a commuting family. 


426 Positive definite matrices 


(b) A; A} Ay = Ax AJA; for every i, j, ke J. Each of these necessary condi- 
tions is also sufficient for the family F to have a simultaneous factoriza- 
tion of the form of the singular value decomposition. 


26. Finding a simultaneous factorization of two given matrices A, Be 

Mm,n of the form of the singular value decomposition is an interesting 

special case of the preceding problem. Show that there are unitary matrices 

VeM,,,WeEM, such that A = VEW* and B= VAW* with E, AEM,» 

“diagonal” if and only if AB* and B*A are both normal. Hint: To show 

that the condition is sufficient, argue that it suffices to consider only the 

case in which A = E is nonnegative and “diagonal.” If the equal diagonal 
entries of E are grouped together, show that if X B* and B*E are normal, 

then B is a partitioned block diagonal matrix with all but perhaps one (if 
A is singular) normal block. For each block, use either the spectral 

theorem for normal matrices or the singular value decomposition to obtain 
the conclusion. 

27. If we wish to have unitary matrices Ve Mm and We M, such that 


every member of the family F ={A;:i€5}CMm,, can be written as 
A;=V2;W* with each X; “diagonal,” show that it is necessary, but. 


not sufficient when there are three or more matrices in the family, that $. ; 
l is singular must satisfy |B|] = 0,(4). But if we choose for B the matrix 


| B=VEW*, where E =diag(0,0,...,0, —o,), the 
E. , : s Yg ere Uy nis n IBI = E = Oy = 
T JE|,2=|Bl|2 and A+B is singular (and has rank n—1). g 


A; Až E Mm and AjA;€M,, be normal for every i, je 3. Hint: Consider 
the family 


{fo thls obli of 


Explain what part of the proof in the case of two matrices does not work 
when there are more than two. 


Further Readings and Notes. Sylvester proved the singular value de- 
composition for real square matrices in 1889. What seems to be the first 
proof of the singular value decomposition for general m-by-n complex 
matrices is in C. Eckart and G. Young, “A Principal Axis Transforma- 
tion for Non-Hermitian Matrices,” Bull. Amer. Math. Soc. 45 (1939), 
118-121. Eckart and Young’s paper also contains the result that two 
matrices A, Be Mm,n have a simultaneous factorization of the form 
of the singular value decomposition in which the respective “diago- 
nal” factors are both real if and only if AB* and B*A are both Hermi- 


tian. For a survey of results and further references on simultaneous § 
É, 2: ; 
< where of is the largest eigenvalue of the positive semidefinite matrix A*A 


factorization of families in the form of the singular value decomposi- 


tion, see P. M. Gibson, “Simultaneous Diagonalization of Rectangular 


Complex Matrices,” Linear Algebra Appl. 9 (1974), 45-53. 


E 7.4.1 
f that are sufficiently close to A (with respect to any norm) are invertible 
3 too. In some statistical modeling problems it is required to find a “nearest 

singular matrix” to A in the sense of least squares; that is, we want to find 


7.4 Applications of the singular value decomposition 427 


7.4 Examples and applications of the singular value 


decomposition 


7 There are many applications of the polar form and the singular value 
; decomposition. Some are given in the problems, and several are dis- 
~ cussed in the following examples. 


Example. If A € M, is a given invertible matrix, then all matrices 


a matrix B such that A+B is singular and |B]. is as small as possible. 
Let |/-]| be any matrix norm, and consider A+B=A(I+A7'B) 


_ which we assume to be singular. If ||A~'B]] <1, then J+ A~'B, and hence 
_ A+B, would be invertible by (5.6.16). Thus, 1< JA~'BIj < A~" IB] 


? 


so if A+B is singular and A is invertible, we must have ||B|| =1/||A7'|. 


| If we choose for Hell the spectral norm, and if A = VEW* is a singular 
_ yalue decomposition of A, then ||A~'l],=|[WI7'V* ||, =E= 1/0, 


where g, is the smallest singular value of A. Then, any B such that A+B 


More generally, if we want to find a “nearest rank k matrix” to a 


3 given matrix A, singular or nonsingular, with respect to the Frobenius 
norm, we may choose A+B, where B= VEW* as before, but E= 
$ diag(0,...,0,-ox41,... 
S a proof and Example (7.4.52) for a generalization of this result from the 
_ Frobenius norm to all unitarily invariant norms. 


,— 9). See Problem 1 at the end of this section for 


The case k=1 occurs frequently enough in the applications that it 


; deserves special mention. A best least-squares approximation to a given 
matrix A= VEW*e M, by a rank 1 matrix Xe M, is X=A+B= 
À 4e +E)W* = V diag(o,0,...,0)W* = avw*, where q is the largest sin- 
gular value of A, and v and w are the first columns of the unitary matrices 
Vand W in a singular value decomposition of A, respectively. A useful 


observation about v and w is that they are unit vector solutions of the 


pair of Hermitian eigenvalue-eigenvector problems 


ee 
AA*v = afv, A*Aw= ofw 


(and AA*). This observation does not uniquely determine v and w, of 


i ourse; one difficulty is that the eigenspaces associated with øf need not 


428 Positive definite matrices 7.4 Applications of the singular value decomposition 429 


be one-dimensional. If øf is a simple eigenvalue of A*A (and hence of ® and only if b is orthogonal to the last m—k left singular vectors of A. 

AA*), however, the eigenvectors v and w are determined up to scalar $ BF eine hie consistency condition, and if V=[v,...v,] and W= 

factors of modulus 1 and must therefore be scalar multiples of the respec- $ [m wn], then (7.4.4) says that be E 

tive first columns of the unitary matrices V and W in a singular value $ f 

decomposition A = VE W*. In this case, for fixed choices of unit eigen- (W*x)*= | b*o EF b*v, 0 o) 

vectors v and w, a best rank 1 approximation to A must be of the form O71 o 

e"o zA for some 0 e R. The scalar factor e” must pe chosen to minimize | 

|A — e” o uw" |5 = |A|} — 20, Re[tr e~A(vw*)*] + of Ju]3|wl3, a prob- 

lem equivalent to Onan Re[tr e~'"9A(uw*)*] = Rele~v* Aw]. But 

Aw =VIW*w=e'®a,u for some ¢e€ R [see Problem 5 of Section (7. 4 

and hence |v*Aw|=0;>0. Thus, the optimal scalar factor is e” 
v*Aw/|v*Aw| = v*Aw/o, and a best rank 1 approximation to A is 


f and hence the vector 


k * 
vu; b 


i=] OF 


W; (7.4.5) 


| isa solution. Since Aw; = V(X W*w;)=0 for all jJ >k, any linear combi- 
_ nation of the last n—k right singular vectors of A (if any) is in the null 
| space of A and hence the vector 
i k * n 
N w+ $ cw 

i=] Oj i=k+] 
will be a solution to Ax =b for any Ck41s +++» Cy EC; this last sum is, of 
' course, absent if n =k. Because the vectors {w;} are orthonormal, the 
| solution with minimum /, norm is obtained when all c; =0. Notice that 
_ the last m—& left singular vectors of A span the null space of AA*, which 
is the same as the null space of A*, so requiring that b be orthogonal to 
~ the last m—k left singular vectors of A is the same as requiring that b be 
| orthogonal to every solution to A*x =0. 


e o vw* = (v*Aw)vw* 


This shows that if the largest eigenvalue of A*A is simple, a best rank | 
least-squares approximation to A can be constructed without further 
effort from the solutions of two Hermitian eigenvalue problems. The 
condition of a simple maximal eigenvalue for A*A is met, for example, 
by any nonnegative matrix A € M,,(R) such that AA’ is positive or, more 
generally, irreducible [see Problem 17 in Section (8.4)]. 


7.4.2 Example. In Theorem (5.7.17) we showed that a vector norm 
G(-) on M, satisfies the condition 


G(A)) G(A2) «++ G(Ag) = p(At + Ag) 


for all Ay, A2,...,A,EM, and all k=1,2,... if and only if G(-) hasa 
compatible vector norm on C”. A crucial step in this argument was to 
show that if G(+) obeys this inequality with respect to the spectral radius, 
then there is some finite constant c > 0 such that G(A;) G(A2) --- G(A,)2 
c|| A; Ap ++: Ag||2, and the key to showing this is the singular value de 

7.4.6 Example. What is the best least-squares approximation to a 


composition of the product A, A, ---A,. The details are in Lemma} l ; 
(5.7.16). | given A€ M, by a scalar multiple of a unitary matrix? Recall that the h 


norm on-M,, is generated by the inner product [A, B] = tr AB*, and that 


A ~ if U is unitary, then 
7.4.3 Example. Suppose one wishes to solve a system of linear equa 
tions Ax = b, where A € M m,n and b e C” are given, and A has rank k. If |U|3=[U, U])=trUU*=trl=n 
A=V2ZW* is a singular value decomposition of A, then VUW*x =b, orf For any ce C and for any unitary Ue M, we have 


E(W*x)=V*b 74A |A—cU|} =[A-cU, A- cU] = |A]}-2Re{ēlA, U]} + nie? 
If m >k, then the last m—k rows of E are 0, and hence if there is to bea’ which is minimized when c= [A, U]/n, and hence 


solution in this case it is necessary (and also sufficient) that the last m- E 
entries of V*b be zero. Thus, the system Ax = bis solvable when m>k if 


' Exercise. If not all of the last m—k elements of V*b are zero, then the 
system Ax =b is inconsistent and there is no solution at all. For some 
purposes, one may be satisfied with a “least-squares” solution, however, 
f which is a vector x e C” of minimum /; norm such that | Ax — b|, is mini- 
T mized. Show that (7.4.5) gives such a least-squares solution. 


|A-cU = [AB + tA, U1? 


430 Positive definite matrices 
If we define f 
u(A)= max |[A,U]]| (7.4.7) 
unitary Ve M,, 


then we obtain a quantity analogous to the numerical radius r(A), for 
which the maximum of the inner product is taken not over unitary ma- 
trices but over all rank 1 Hermitian matrices of Frobenius norm 1. Unlike 
the numerical radius, however, the function u(A) is a matrix norm on 
M,, [see Problem 5 and Example (7.4.54)]. 

It is easy to identify the value of u(A) as well as the extremal unitary 
matrix. Let a singular value decomposition of A be A= VEW*. Then | 


u(A)= max |{A,U]|= max |[VOW*,U]| 
unitary U unitary U 
= max |trVEW*U*|= max |trE(W*U*Vy)| 
unitary U unitary U 
n 
= max |trEU|= max È Oui; 
unitary U unitary U = [u;;] i=] 
n n 
= max D e Y 9; 
j=l 


unitary U= [u;;] i=l i= 
But if A= PU is the polar form of A, then 
n 
[4, U]=tr PUU*=trP= } o; 
i=] 

so the upper bound given for u(A) is sharp, u(A) = 0 (A) + tee fp on (A), 
and a best least-squares approximation of A by a multiple of a unitary 
matrix is given by 


1 
hae e +6,)U 


if A = PU is the polar form of A and 9,..., 0, are its singular values. Ifa 
singular value decomposition A = VEW* is given, then U=VW*. The 
error in the approximation is 


2 n 1 n 2 
MA of = 14-2114, up= Eo- Èo) 
n = n\; 
which is 0 only when the Cauchy-Schwarz inequality 


(3) (ENEA) 


is an inequality. Thus, A can be perfectly approximated by a multiple of 
a unitary matrix only when all its singular values are equal. 


josie 


2 


7.4 Applications of the singular value decomposition 431 


7.4.8 Example. Suppose A, Be M,m,n are given, and we wish to deter- 
mine whether A was produced by “rotating” B; that is, is A= UB for 


some unitary matrix Ue Mm? More generally, if we consider all the 
' possible “rotations” UB of the given matrix B, how well can we approxi- 
_ mate A in the sense of least squares? This is known in factor analysis as 


the problem of finding a “procrustean transformation” of B. 
The computations are very similar to those in the previous example; 


we seek to choose U to minimize |A—UB|,, and we compute, as before, 


|A— UB|3=[A-—UB, A— UB] =|A|3—2Re[A, UB] + |B| 


Thus, we must find a unitary matrix U that maximizes Re[A, UB] = 


- Retr AB*U*, If AB* = VEW* is a singular value decomposition of AB’, 
-then 


Re tr AB*U* = Re tr VEW*U* = Re tr EW*U*y 


m 
=Re } o;(AB*)ti 
ET 


i= 


where T=[4;]= W*U*V is a unitary matrix. This sum is maximized 
_ when all ¢;;=1, that is, when U = VW*; VW* is just the unitary part of a 
_ polar decomposition of AB*. 


Thus, a best least-squares approximation to A € Mm,n by a matrix of 


the form UB, where BE M,,,, and Ue Mp is unitary, is given by UB = 
| (VW*)B, where AB* = VE W* is a singular value decomposition of AB‘, 
or AB*= P(VW*) is a polar decomposition of AB*; we do not need to 
know Vand W separately. The error in this approximation is given by 


- min{|A—UB|.: Ue Mm is unitary} = |A—(VW*) BI, 


m 1/2 
= lai \a)3-2 È o(4B*)| 


© where {0;(AB*)} is the set of singular values of AB*. 


If we want to know whether A is exactly a rotation of B, then an 


obvious necessary condition is that |A|,=|Bl,, and a necessary and 
sufficient condition is that 


|A = |B|} = È o(AB) 


j= 


where {o;(AB*)} is the set of singular values of AB*. 


Finally, if we consider the special case m =n and B =], then we have 


the result that a best least-squares approximation of a given matrix A€ 
Mp, by a unitary matrix Ue M, is given by U = VW*, where A = VEW* is 


432 Positive definite matrices 7.4 Applications of the singular value decomposition 433 


a singular value decomposition of A, or where A= PU = P(VW*) isa 
polar decomposition of A; the error in the approximation is 


f the products AB and BA are defined and positive semidefinite. This result 
is useful in approaching several types of matrix optimization problems. 


|A-vw+B= [AB +UB-2 E 0;(A) 
pak ' 7410 Theorem. Let Ac Mm ns BEM, , and g=min{m,n}. Let 


E 0;(A),.. r+, Og(A) and 0)(B),..., Oy (B) denote the singular values of A and 
| B, respectively, arranged in nonincreasing order. If both AB e M, and 
| BAeM, are positive semidefinite, then there exists a permutation 7 of 

the integers 1, 2, ..., q such that 


n n 
= ¥ of(A)+n—2 J 0(A)= X (9i(A)~1)° 
i=l i=] j= 
where {o;(A)} is the set of singular values of A. 

As part of the discussion in the preceding example, we found a solu-§ 
tion to the problem of maximizing Re tr AU over all unitary matrices UE 


q 
trAB=tr BA= X 9;(A)o,~)(B) (7.4.11) 
For convenience of later reference, we summarize this result as i=l 


Proof: If m =n, if each of A and B is positive semidefinite, and if A and 
7.4.9 Theorem. Let A € M, be a given matrix, and let A = VE W* bea B commute, then they can be simultaneously unitarily diagonalized as A = 
singular value decomposition of A. Then (a) The problem $ UAU* and B= UMU*, where Ue Mp is unitary, A=diag(), Am) 
t , PPE IIT 
PE etr AUN UIT Ais uhitaeys M = diag(y,..., pm), and all \;, p; are nonnegative. In this case we have 


tr AB =tr (UAU*)(UMU*) =tr UAMU* = tr AM = 5 Xj fi 

i=l 

Since the eigenvalues );, p; are also the singular values of A and B, the 
4 theorem is proved in this particular case. 

E There is no loss of generality to assume that m <n, for if m>n, one 
` can just interchange A and B in the statement of the theorem. 

S To prove the theorem in general, we claim that it suffices to show that 
i for any pair of matrices A € M m,n and Be M,, m such that m < n and both 
j AB and BA are positive semidefinite, there is a unitary matrix Ve M, and 
T amatrix Ye M,,, with orthonormal rows such that the transformation 


A=Y*AV and B=V*BY (7.4.12) 


produces a pair of commuting positive semidefinite n-by-n matrices A 
and Ê. In this event we have 


tr AB=trABYY*=tr Y*ABY =tr (Y*AV)(V*BY) 


= $ o(¥*AV) 0,4)(V*BY) = F o(Â) oa (Ê) 


i=] i=] 


by the above observation. Notice that A*A = V*A*YY*AV = V*A*AV = 


has the solution U = WV*, and the value of the maximum is oi (4)+ +E 
o,(A), where {o;(A)} is the set of singular values of A. (b) There ed ; 
unitary matrix Ue M, such that AU e M, is a positive semidefinite Her- 
mitian matrix. A unitary matrix U is a maximizing matrix for the prob- 
lem in (a) if and only if AU is positive semidefinite; U is uniquely deter- 
mined if A is nonsingular. The eigenvalues of AU are the singular values 
of A. 


Proof: Compute 
n 
Re tr AU =Re tr VEW*U = Retr E(W*UV) = $ Re o;(W*UV); 
i=l 


which is maximized only when all (W*UV),;=1. Since W*UV is unitary, 
this happens if and only if W*UV =I, or U = WV*. For this choice of U, 
AU = VIW*WV* = VEV*, which is Hermitian and positive semidefinite 
since E = diag (01, ..., on) and all o; =0. If U, € M, is any unitary matri 
for which AU, is positive semidefinite, the eigenvalues of AU; are thi 
singular values of A because the singular values are unitarily invariant 


The uniqueness in the nonsingular case follows from the uniqueness pang 
(AV)*(AV), so the singular values of A are the same as those of AV, 


of (7.3.3. O | 
For any matrix A € Mn, n AA* and A*A are both positive semidefinite f which are the same as those of A since (AV)(AV)*=AA*. A similar 
argument shows that the singular values of B are the same as those of B, 


and tr AA*=tr A*A =0F(A)+- + On int im, n} (4), which may be thought off 
as a sum of products of singular values of A and A*, respectively, since thy 50 we conclude that 
singular values of A* are the same as the singular values of A. This simp 


observation has a generalization to any pair of matrices A, B for whig tr AB= Ð oi(A)onn (B) 
E- i=] 


434 Positive definite matrices 7.4 Applications of the singular value decomposition 435 


(AU)(U*B), so A and B commute. This is a transformation of the form 
(7.4.12), so we are done if X>0. 
. If \=0, then AB = BA =0. Again choose a unitary U such that AU 
is positive semidefinite. Then 0=AB=(AU)(U*B) =(U*B)(AU)= 
U *0OU=0, so AU and U*B commute and every eigenspace of the Hermi- 
“tian matrix AU is invariant under U*B. If W=[w,... w,] is a unitary 
| matrix whose columns consist of eigenvectors of AU, and if the columns 
are arranged so that all the eigenvectors corresponding to the same eigen- 
value of AU occur contiguously, then both W*(AU)W and W*(U*B)W 
are block diagonal with 


—W*(AU)W = diag(Aj,..., Ap), W*(U*B)W =diag(B,,..., B,) 


_A,and B; are the same size, and A;=)j,J, i=1,2,...,7, where M, \2,...,A 

are the distinct (nonnegative) eigenvalues of the positive semidefinite 
matrix AU. We have A;B;= B; A; =0 for all /=1,...,r. If 4; #0, then the 
pair A; = à; I, B; = 0 is a commuting pair of positive semidefinite matrices, 
as required. If \; = 0, then B; is not necessarily zero, but there is a unitary 
matrix U; such that UB; is positive semidefinite [apply (7.4.9) to B*] 
-and, in this case, A;U;=0 and U7B; constitute a commuting positive 
semidefinite pair obtained by a transformation of the form (7.4.12). This 
completes examination of all possible cases. U 


as claimed. We proceed in three steps to establish the existence of a trans- 
formation of the form (7.4.12) with the required properties. 

(1) Let A and B satisfy the hypotheses of the theorem. Recall from 
(1.3.20) that the eigenvalues of BA are the same as those of AB (count- 
ing multiplicities) together with an additional n—m zero eigenvalues. If 
Ajs <- Am are the eigenvalues of AB and A = diag (^i, .--, Am), then since 
both AB and BA are Hermitian by assumption, there are unitary ma 
trices Ue Mn and VeM,, such that 

AB=UAU* and BA= vio o y* 

If we partition V = [V1 V2] with Vi € Mn, m and Vz € Mp, n-m» then V is 
matrix with orthonormal columns, so ViV,=leM,,. Then A= U*ABU 
and BA=V,AV;, so BA =(V,U*)AB(UV;). Let Y= UV EM,» an 
observe that YY* = UV;*V,U* = UU* =I, so Y has orthonormal rows an 
BA = Y*ABY. Set A= Y*AeM, and B=BYeM,, and compute AB= 
Y*ABY = BA and BA = BYY*A = BA; the product BA is positive semi 
definite by assumption. Thus, there is a transformation of the for 
(7.4.12) [with V=J] that yields a commuting pair of n-by-n matrice 
whose product is positive semidefinite. The individual terms A and 
may not be positive semidefinite, however; a further transformation 0 
the form (7.4.12) may be required to achieve this. 

(2) Without loss of generality we may now assume that m=n, th 
A, Be M,, commute, and that the product AB is positive semidefinite. 
(AB)x =x with x #0, then (A4B)(Ax) = ABAx = AABx = A(ABx) 
Ax = (Ax), so each of the eigenspaces of the Hermitian matrix AB 
invariant under A. The same argument shows that each of these eige 
spaces is also invariant under B. Thus, if U=[m...u,] is a unita 
matrix whose columns consist of eigenvectors of AB, and if the columns 
are arranged so that all the eigenvectors corresponding to the same eige 
value of AB occur contiguously, then both U*AU and U*BU must be 
block diagonal with 


A=U*AU = diag(Ai, A2, ---, Ar),  B=U*BU=diag(Bi, B2, -3 Br) | 


7.4.13 Example. As a variation on the rotation problem in (7.4.8), let 
A, Be My, be given and suppose we wish to determine whether A was 
produced by a two-sided “rotation” of B; that is, is A= UBV for some 
unitary matrices Ve Mm, Ve M,,? More generally, if we consider all the 
possible two-sided “rotations” UBV of the given matrix B, how well can 
we approximate A in the sense of least squares? 

As before, we seek to choose unitary matrices Ve M, and Ve M,, to 
minimize |4— UBV |z, and we compute as before ” 


|4- UBV |} = [A—UBV, A— UBV] =|A|3—2 Re[A, UBV ]+ |B|} 


Thus, we must find unitary matrices Ve M,, and Ve M, that maximize 
Re[A, UBV] = Re tr AV*B*U*. Maximizing unitary matrices Up, Vo for 
this problem must exist (but are not necessarily unique) because the sets 
of unitary matrices in M, and M,, are compact and the Cartesian 
product of compact sets is compact. The maximizing matrices Up, Vo 
have the property that 


Re tr(AVgB")U@ = Re tr(AVğB*)U 


for any unitary matrix Ue Mm, so by (7.4.9) we know that AVgB*U4 is 
positive semidefinite. By the same argument, 


where A;, Bie Mko 1S ki sn, kı+ +k, =n, and each A; B; = B;A;= 
Je Mk, where M, àz,- d, are the distinct (nonnegative) eigenvaluey 
of the positive semidefinite matrix AB. 

(3) Without loss of generality we may now assume that m=n, A, Be 
M,, A and B commute, and AB =M with \=0. If \>0, then both 4 
and B are nonsingular and B= AAT. Use (7.4.9) to find a unitary matri 
UeM, such that Â= AU is positive semidefinite. But then Ê=U*B 
\U*A~! = (AU)! is also positive semidefinite since \ > 0 and (AU) iy 
positive semidefinite. Furthermore, (U*B)(AU) = U*MU = AI = AB= 


436 Positive definite matrices 


Re tr AV) B*Ug = Re tr(B*UG A) Vo = Re tr B7US AV 


for any unitary matrix Ve M,,, so by (7.4.9) again we know that B*Uj AV | 


is positive semidefinite. Thus, the two matrices AVg € M,,,, and B*UȘ g 
Mn,m satisfy the hypotheses of Theorem (7.4.10) and hence if q= 
min{m,n} we have 


max{Re tr AV*B*U*: Ve Mm and VeM,, are unitary} 


q q 
= Re tr AVŠB*Uù = © o;(AVò) ora (B*U0) = z oil A)o (B) 
1 i 


i= = 


for some permutation 7 of the integers 1,...,q since the singular values 
are unitarily invariant. There is no loss of generality if we write the sin- 
gular values 0;(A), ..., 0g(A) and o;(B), ..., ¢g(B) in decreasing order. If 
the permutation 7 is not the identity, there are indices j4, lz with 1<i,< 
i <q for which o,(;,)(B) £ oru) (B), and one checks easily that the sum- 


q 
x 0;(A)o,(i)(B) 


{ == 


is not decreased if the permutation is altered to exchange the positions of 
these two singular values. In fact, the difference between the new and old 


values of the sum is 
[o;,(A) —9;,(A)]lo,41,)(B) -oup (B)1 = 0 


Thus, the maximum value of the sum is achieved for the identity permu- 
tation 7, and we can conclude that 


q 
max{Re tr AV*B*U*: VEM,,,VeEM,, are unitary} = 2 oil A); (B) 
(7.4.14) 


i= 


where the singular values of A and B are both arranged in decreasing $ 


order. 


Using this result in our original minimization problem, we find for = 


A, BeMm,n and q=min{m, n} that 
min{{A—UBV]|,:UeM,, and VeM, are unitary} 


=[|A}-2 Š o;(4)o;(B)+ | B1317 
i=l 


r s a 1/2 
-| $ oF(A)-2 X a(A)o(B)+ $ o2(B)| 


? 1/2 
=| 5 (oA) 4B)? 


i=l 


(7.4.15) 


7.4 Applications of the singular value decomposition 437 


| Inparticular, A is a “two-sided rotation” of B if and only if A and B have 
| the same set of singular values. C] 


4 Exercise. What does (7.4.15) say if B= 7? Compare with the result at the 
| end of example (7.4.8). What does (7.4.15) say if B is diagonal and has 
~ rank k? Compare with the comments in Example (7.4.1). 


: 7.4.16 Example. As another example of the use of singular values, we 
| consider the question of characterizing unitarily invariant norms of 
» matrices, which were introduced in Section (5.6). 


Definition: A vector norm |+*| on M,,,, is said to be unitarily in- 
variant if 


|VAV|=|A| 


| forall A€ Mm,n and for all unitary matrices U €e Mm, V € M}. 


If AeMm,n is a given matrix and if A=VEW* is a singular value 
decomposition of A, then |A|]=|V2&W*| =|] for any unitarily invari- 


| ant norm ||. Thus, a unitarily invariant norm of a matrix of a given size 
depends only on the set of singular values of the matrix. 


Two familiar examples of unitarily invariant norms are the Frobenius 


(Euclidean) norm and the spectral norm. If the singular values of 
X=([xij]€ Mm, n are 0) 2 022 +++ 20420 (q =min{m, n}), then 


wna ( 3 Zur) =(Set) 


j=li=l j 


j= 


q and 


lX |2 = max ME [o(X*X)]'? = o= maklo 4} 


vzo la 
For a general unitarily invariant norm |+| on M,,,, the nature of its 


| dependence on the singular values of its argument is easily determined. 


For convenience, assume that msn, let A =diag(xi, X2, --., Xm) EMm, 
and define the partitioned matrix 


X=[A;0], AéM,,, 0€ Min, n-m 


Since X.X* = diag(|x,|”, |x2|7,...5 |Xm|7), the set of singular values of X is 


{o;i} ={|x;|}. If we define the function g: C” > R* by 
B(x) = 8 (Lx. Xm] = IX | 
then the function g(+) inherits certain properties from the norm ||: 


(7.4.17) g(x) 20 for all xe C” since |X| =0 for all XEM, » 


(7.4.18) g(x) =0 if and only if x =0 since |X| =0 if and only if X =0. 


438 Positive definite matrices 7.4 Applications of the singular value decomposition 439 


(7.4.19) g(ax) =|a] g(x) for all xe C” and all ae C since Ja X| = |a| |X| 
for all we C and all Xe Mm,» 

(7.4.20) g(x+y)sg(x)+g(y) for all x, ye C” since |X+Y|s |X] +) 
for all X, Ye Mm,n 


be defined by g(x)=| X|. Then g(+) is a symmetric gauge function. 
Conversely, if g:C7—R* is a given symmetric gauge function, and 
if |*|:Mm,n—2R* is defined by |A]=g([o1,...,09]’), where o,..., 0g 
are the singular values of A, then fe] is a unitarily invariant norm on 
My, n- 


Y, 


These four properties say that g(+) must be a vector norm on C”, but 
g(+) has two additional properties: 


(7.4.21) g(+) is an absolute norm on C”, as defined in (5.5.9); that is, if 
x=[xj]eC” and if y=[y]=[|x\]eC”, then g(x) =g(y). 
This is because g(x) = |X| depends only on the singular values 
of X, which are o; = |x;|. 

(7.4.22) If PeM,, is a permutation matrix, then g(Px) = g(x) for all 
xeC” because the set of singular values of X =[A/j;0] is the: 
same as that of [PA/0] since (PA)*(PA) = A*P!PA = A*A, 
The function g(x) is a function of the set of absolute values of 
the components of x without regard to their ordering. 


Proof: The forward assertions have already been proved. For the con- 
verse, observe that |+| is a well-defined function on Mm,» because g(*) is 
a permutation-invariant function of the components of its argument. 
‘Because the set of singular values of a matrix is a unitary invariant, we 
also have |UAV|=|Al for all unitary Ue Mm and VeM,. Because 
g(*) is a vector norm, we have |A|=0 for all Ae M,,,,, and |A|=0 if 
and only if g([o;, 5 0g] ) =0, and this can happen if and only if all 
g;=0 since g(«) is positive (7.4.18). But the zero matrix is the only ma- 
trix whose singular values are all zero, so the function |e] is positive 
(5.1.1(1a)). It is also homogeneous since o;(cA) = |c|o;(A) and hence 
\cA] =g([|c]oy,---, |clog]”) = elella -> 0q]’) = |c| |A|. What we have 
shown so far is that any function || generated in this way by a symmetric 
gauge function is a pre-norm on M,,,,, [see (5.4)]. It remains to be shown 
that |e] satisfies the triangle inequality, and we shall do so by showing 
that |e] is the dual of a pre-norm and hence [see the discussion following 
(5.4.12)] is actually a norm. 

Consider the dual g?(+) of the norm g(+) on C?: 


Exercise. Compute an explicit singular value decomposition for the. 
matrix X=[A,/0]EM,,,, A=diag(x1,...,Xm), which we have been 
discussing. : 


Exercise. If mzn, let X=[A!0]" with A=diag(x,,...,X,)@M, and: 
define g(x) =| X| for xe C”. If |e] is a unitarily invariant norm on Mm, 
show that g(s) is an absolute vector norm on C” such that g(Px) = g(x) 


for all xe C” and every permutation matrix Pe Mn. g?(y)= max Re y*x (7.4.25) 


g(x)=1 
Exercise. Show directly that the vector norms g(+) derived from the: 
Frobenius and spectral norms satisfy the six properties (7.4.17)-(7.4.22) 
above. 


The function g? (+) is always a norm because g(e) is a (pre-)norm, but it 
is also a symmetric gauge function because 


(7.4.21’) If E=diag(e™, ...,e7) with all 0;e R, then 


g?(Ey) = max Re(Ey)*x = max Re y*(Ex)= max Re y*x 
a(x) =] g(x) =l g(Ex)=1 


7.4.23 Definition. A function g(-): C’ +R? is said to be a symmetric 
gauge function if and only if it satisfies the six properties (7.4.17)-. 
(7.4.22) above, that is, if and only if it is an absolute vector norm that isa 
permutation invariant function of the entries of its argument. 

We have seen that every unitarily invariant norm on M,, ,, generates a: 
symmetric gauge function; it is more interesting that the converse is true: 
as well. The following theorem says that a function N(+) on M,,,, isa 
unitarily invariant norm if and only if N(A) is a symmetric gauge func- 
tion of the singular values of A. , 


= max Re y*x=g8g”(y) 
g(x)=l 


because g(+) satisfies (7.4.21). Thus, g”(e) also satisfies (7.4.21). 
(1.4.22') The same argument shows that if PeM, is a permutation 
matrix, then 


g?(Py)= max Re(Py)*x = max Re y*P!x= max Re y*x 
g(x)=1 a(x) =1 g(Px)=1 


= max Rey*x%g?(y) 
g(x)=1 


- because g(«) satisfies (7.4.22). 


7.4.24 Theorem. Let |+| be a unitarily invariant norm on M,,,, | 
q=min{m,nj, let x=[x;]e C7, let X,=diag(x,,...,x,), and let X 
[Xi 016 Mm, nif msn or X=[X,'0]'EMp,, if mæn. Let g: C1 — 


440 Positive definite matrices 


Thus, we can define the function |+|? on M, associated with the sy 
metric gauge function g” (s): 


[A]? = 8” (Lo, ...,09]7) 
where oj,..., 0, are the singular values of A. [There is a conscious abu 
of notation here: |+|” usually denotes the dual of the norm [+]; althoug 


we do not yet know that || is a norm, we shall show that this is the ca 
and that |+|, as defined in terms of a symmetric gauge f function g”(+), 
its dual.] We have already shown that this function ||” is a pre- -norm o 
Mq, since it is defined in terms of a symmetric gauge function g?(«). 

‘Now compute the dual of |+!”, which is guaranteed to be a norm o 
Minn by (5.4.12). Observe that a matrix Be Mm, n satisfies |B]? =1if an 
only if a singular value decomposition of B is B = VEW* with unitary ma 
trices Ve Mm and WeM,, E =diag(o,..., 0,), and eP (lor, ..-, 6q]’) 
1. For each given matrix A € M,, , we have 


(|A|?) = max x RelA, B]= max x Retr AB* 
[BP IBI? 


max {Re trA(VIW*)*: vem, and We M, are unitary, 
È =diag(s),..., 
sq]')=1} 


For each diagonal matrix E satisfying the constraint, we can use (7.4.14 
to evaluate the maximum value that can be achieved over all choices o 
unitary V,W: 


Sg), and 


2? (is, er., 


ix 


q 
(AJP)? =maxf È oA sil: g(s sasa) =a] 
But since all o;(A)=0 it is apparent from Definition (5.4.12) that this 
maximum is exactly the dual norm of g P(.) evaluated at the point 
[o;(A),..., 0,(A)]’. The duality theorem (5.5.14), however, guarantees 
that the dual of the dual of a norm is the original norm, so 


(JAI)? = (27)? (L01(A), «5 0g(A)]") = B([04(A), ..., 0g(A)]") = [AI 


Thus, for all Ae Mm, n, |A| =(JA|P)”, which guarantees that [+] is 
actually a norm, and hence it satisfies the triangle inequality. This con- 


clusion also justifies our abuse of notation, since (|A|)? = ((JA]?)?)? = 
|A]? by the e duality theorem. Thus, |+|”, defined via the symmetric gauge 
function g P(e), is actually the same as the dual of the norm ei. oO 


An important and familiar example of a family of symmetric gauge 
functions on C” is the family of /, norms (5.2.4) 


singular values of A by o 2> 


7.4 Applications of the singular value decomposition 44] 


n 


1/p 
glxi x= lls $ bai”) - i<p<co 


i=l 


When applied to the singular values of a matrix, as described in Theorem 
(7.4.24), the /, norms generate unitarily invariant norms on Mm, „n known 
as Schatten p norms. The case p =2 is the Frobenius (Euclidean) norm 


ab=| Z oa] i 


I 


the limiting case p — oo is the spectral norm 


|| A l]2 = max{o;(A)} 


and the case p=1 is the trace norm 


Alle = » o;(A) 


The trace norm arose naturally in Example (7.4.6) when we considered 


he problem of approximating a given square matrix by a scalar multiple 
of a unitary matrix. 

Another family of symmetric gauge functions on C”, which also 
ncludes the trace norm and the spectral norm, is given in (7.4.44). 


-7.4.26 Example. Singular values play an important role in deriving an 
inequality of Wielandt that gives a geometric meaning to the condition 


number of a square nonsingular matrix with respect to the spectral norm. 
Let Ae M, bea nonsingular matrix, let B = A*A e M,, and denote the 
- 20, >0. The eigenvalues of the positive 
definite matrix B (arranged in the conventional increasing order) are 
0<o7< ys of. Let x, yeC” be any pair of orthonormal vec- 
tors, define C= [x yI*Bixy]e M2, and denote the eigenvalues of C by 


0<¥;= y2. The Poincaré separation theorem (4.3.16) with r =2 says that 


M(B) = 074415 4 (C) = Yk Sng k—-2(B) = 03-4,  k=1,2 


or 
< < 2 
15 3Y2=0; 


<y <o? and o 


For our purposes, the only interesting implication of these inequalities is 


oS yi <72<0/ (7.4.27) 


in which the first and last inequalities are equalities if x and y are ortho- 
normal eigenvectors of B corresponding to eigenvalues that are squares 
of the largest and smallest singular values of A, respectively. 


442 Positive definite matrices 7.4 Applications of the singular value decomposition 443 
Compute ; , Ax, Ay)| < cos O|Ax]2|Ay|> (7.4.33) 
x*B *B *By)—|x*Byl . 

-—> k | 3 = 7 (x x) (y By ) ; |x*By i 5 for every pair of orthogonal vectors x, y e C”, where (u, v) = v*u denotes 
(x*Bx)(y*By) (x*Bx + y*By)* — (x* Bx — y*By) the Euclidean inner product and Jula = (u*u)? denotes the Euclidean 
_ 4det C norm. Moreover, there exists an orthonormal pair of vectors x, ye C” 

— (tr C)?—(x*Bx— y*By)? (7.4.28) for which equality holds in (7.4.33). 

4yiy2 An 


7.4.34 Theorem. Let BeM, be a given positive definite matrix with 
eigenvalues 0 <M SMS e SA,,. Then 

Anm A 
Ant 
for every pair of orthogonal vectors x, ye C”. Moreover, there exists 
an orthonormal pair of vectors x, ye C” for which equality holds in 
(7.4.35). 


~ (y+ y2)?—(x*Bx—y*By)? ~ (+y) 


with equality if and only if x, y e C” are orthonormal and x*Bx = y*B 
We transform this inequality into the equivalent inequality 


Ix*By|? Ay - (18) = (BOS 
(x*Bx)(y*BY) © (mty) \nty/ \n/m+l 
The upper bound in (7.4.29) is a monotonically increasing function of 
the ratio y2/yı [as may be shown easily by observing that the derivative 
of the function f(t) = (t—1)/(¢ +1) is positive for t > 0]. By (7.4.27), this 
ratio has the upper bound 07/07, and hence 


|x*By|? < oj/o7—1 2o kK?—1V 74.30 
(x*Bx)(y*By) ~ \ of/o7+1] ~~ \ «741 (7.4.30) 


where we have introduced the positive parameter «= «(A)=0)/o, 
| All. | A7'Il2 = 1, which is the condition number of A with respect to the 
spectral norm. If t, „e C” are orthonormal eigenvectors of B corre- 
sponding to the eigenvalues oj and ož, respectively, and if x =(u;+u,)/V2, 
y=(u,—U,)/V2, then {x,y} is an orthonormal set, x*Bx=y*By 
(of +.07)/2, and x*By = (of —07)/2, so equality is attained in (7.4.30) in 
this case. 
Define the angle @ in the first quadrant by cot(0/2) =x, so that 


K?—1 _ cot?(0/2)—1 _ cos*(6/2)—sin*(6/2) 


2 
|x*By|? s ( ) (x*Bx)(y*By) (7.4.35) 


2 
) (7.4.29) 


Proofs: The inequality (7.4.33) follows from (7.4.31) by substituting 
B=A*A. The inequality (7.4.35) follows from (7.4.30) by substituting 
of =)»; 41 and recognizing that every positive definite matrix B is of the 
form B = A*A for some nonsingular A e M,,; one may take A= B!?, We 
have already observed that equality can be attained in (7.4.30) by an 
orthonormal pair. CO 


Exercise. Show that (7.4.35) is an improvement on the general Cauchy- 
Schwarz inequality, which is |x*By| = Cy, Cx)| <|Cx|2|Cyl2, where 
C=B'/*. However, the Cauchy-Schwarz inequality applies to all pairs 
x, y, whereas (7.4.35) applies only to orthogonal pairs. What happens if 
A = An? 


The form (7.4.33) of Wielandt’s inequality leads immediately to a 


«+1  cot?(0/2)+1  cos?(0/2) +sin2(6/2) =cos 0 useful geometrical interpretation of the spectral condition number. If 
. . x, yeéC" is an arbitrary orthonormal pair, the left-hand side of the 
and (7.4.30) can be written in the form inequality 
|x*By|? 2 
< cos? 0 7.4.31 KAx, Ay)| 
(x*Bx)(y*By) ( ) Taxh Ayl =cos0 (7.4.36) 


If we now observe that the left-hand side of this inequality is homoge- 
neous of degree 0 in both x and y, we can finally state Wielandt’s inequal- 
ity in two equivalent forms: 


is the ordinary cosine of the smaller Euclidean angle between the nonzero 
vectors Ax and Ay. This bound says that the smaller angle between Ax 
and Ay is at most 6=6(A), where 0(A) is defined by cot[@(A)/2]= 
x(A). Since equality is possible in this bound, we have established the 
geometrical interpretation of 6(A) as the minimum angle between Ax 
and Ay as x and y range over all possible orthonormal pairs of vectors. 
This point was discussed in Sections (5.8) and (6.3). 


7.4.32 Theorem. Let Ae M, be a given nonsingular matrix with spec- 
tral condition number «x, and define the angle @ in the first quadrant by 
cot(0/2) =x. Then 


A well-known inequality of Kantorovich follows easily from Wielandt’s . 


inequality. For any xe C”, define 


Y= [xli (Bx) (xB y) (7.4.37) 


and notice that x* y =0. Compute 
By = |x|3.x—(x*B~!x) Bx 
x*By = |x]3—(x*B ox) (x*Byx) 
Y*By = —(x*Bo'x)(y*Bx) 


Since B, and hence B~! also, is positive definite, we must have y*By>0 7 


and hence Y*Bx =x*By <0. Write the inequality (7.4.31) in the form 
|x*By|? < cos? 0(x*Bx) (y*By) 


and substitute the values for the particular choice (7.4.37) of the pair x,y 


to obtain 


|x*By|* < (cos?) (x"Bx) (x*BO'x)(—x*By) 


In either of the two possible cases x*By <0 or x*By = 0, this implies that _ 


"By = —[|x]3—(x*Bo!y) (x*Bx)] < (cos? 6) (x*Bx) (x*B- 'x) 
or 


(sin? 0) (x*Bx) (x*B-!y) < Ixl (7.4.38) | 


for any x e C”, Notice that (7.4.38) is an equality if x = u+ u, is the sum 


inequality corresponding to the two forms of Wielandt’s inequality. 


7.4.39 Theorem. Let 4 € M, be a given nonsingular matrix with spec- 


tral condition number x and define the angle @ in the first quadrant by | 


cot(@/2) =x. Then 


x]3 = sin BO Ax|2|(A*) x], (7.4.40) 


for all xe C”. Moreover, there is a unit vector x for which (7.4.40) is an | 


equality. 


7.4.41 Theorem. Let BeM, bea given positive definite matrix with — d 


eigenvalues 0 < Arsise g An- Then 


|x[3= eee (x*Bx) (x*B-!x) 
a (7.4.42) 


7.4 Applications of the singular value decomposition 445 


| for all xec”. Moreover, there is a unit vector x for which (7.4.42) is an 
~ equality. 


| Proofs: These two results follow from (7.4.38) by substituting B = A*A 


into (7.4.38) and by recognizing that 


eeu) 4rd, 


wd 2 Seeman sean dee Wr 
pi era Gece (tAn)? 


f The fact that equality is possible in both cases follows from the case of 
~ equality for (7.4.38). © 


7.4.43 Example. It is sometimes possible to prove norm inequalities for 


matrices that hold for all unitarily invariant norms. The key to doing so 


: lies in recognizing the fundamental role of the particular symmetric 
B® gauge functions g,([x,, +5 Xn]") defined on C” by 


kalsan 
(7.4.44) 


g(x) = maxf jx + e alisi <in}, 


When applied to the singular values of a matrix, as described in Theorem 


(7.4.24), this particular family of symmetric gauge functions generates a 
family of unitarily invariant norms on M m,n known as Ky Fan k norms. 
The case k = 1 is the spectral norm and the case k = min{m, n} is the trace 


F - norm. 
of orthonormal eigenvectors of B corresponding to its smallest and $ 
largest eigenvalues. This leads to two equivalent forms of Kantorovich’s 


7.4.45 Theorem. Let x=[x;], y= [y;]e C” be given vectors. Then 
&(x) <g(y) for all symmetric gauge functions &(*) on C” if and only if 
&(X) S 2(y) for k=1,2,..., 7, where the &x(*) are the particular sym- 
metric gauge functions defined in (7.4.44). 


Proof: Since each g;(«) is a symmetric gauge function, the necessity of 
the condition is clear. To prove sufficiency, suppose &x(X) Sg; (y) for 
k=1,2,...,n and let g(+) bea given symmetric gauge function. Because 
a symmetric gauge function is a permutation-invariant function of the 
components of its argument (7.4.22), there is no loss of generality if we 
assume for convenience that the absolute values of the components of x 
and y are each arranged in nondecreasing order 


hajs hols- s |x], Dils jys s |y] 


The assumption that g(x) S 8x(¥) for all k=1,2,..., 7 is then equiva- 
lent to the set of n inequalities 


7.4 Applications of the singular value decomposition 447 


446 Positive definite matrices 
7.4.47 Corollary. Let A, Be Mm,n be given matrices with respective sin- 
gular values 0;(A)2 ->- =0,(A)2=0 and o,(B)2 --- 20,(B)=0, where 
g=min{m, n}. In order that JA] < |B] for every unitarily invariant norm 
+| on Mm,n it is sufficient that 


0;(A)<o;(B) forall (=1,2,...,q¢ (7.4.48) 


Xn] S| In| 
[Xn—1|+ EA s I¥n—1]+ [Yn] 
(7.4.46) 


[xa] + e + [Xn] S yalt Yn] 
nd it is necessary and sufficient that 


0,(A) <o (B) 
0\(A) + 02(A) S$ 0;(B) + 02(B) 


[xa] + [x2] + ++ + Xn] S altyt 2 + [Pn (*) 


The similarity between these inequalities and the defining inequalities 
(4.3.24) for majorization is not merely superficial. 

If the last of these inequalities (*) is not an equality, modify y by 
decreasing the absolute value of the component y; until either (a) inequal- 
ity (*) is an equality or (b) | yı] is decreased to zero. If (b) occurs before 
(a), repeat this process with the next component y2, and so on until (a) 
occurs. The result will be to produce a modified vector y'= [y/] such 
that |y/|<|y»;| for i=1,...,n, gx) sg (y) for all kK=1,...,7, and 
equality holds in (*). Because an absolute norm is also a monotone norm 
(5.5.10), we have g(y’)<g(y). Thus, if we can show that g(x) <g(y) 
for any x, ye C” that satisfy the inequalities (7.4.46) and for which (+) 
is an equality, we can conclude that g(x) <g(y) for any x, ye C” that 
satisfy (7.4.46) in general. 

The assumption that (7.4.46) holds with equality in (*) is exactly the 
assumption that the vector —|x|=[—|x;|]eR” majorizes the vector 
—|y|=[—|y»;|]eR” (4.3.24), and in this event we know (4.3.33) that 
there is a doubly stochastic matrix Se M, such that — |x| = $(—|y]), or 
|x| =S|y|. Since every doubly stochastic matrix is a convex combina- 
tion of finitely many permutation matrices (8.7.1), we can write S= 
a, P+ e +an Py, where a; 20, a+- +ay=1, and each P;éM,, isa 
permutation matrix. In this event, we have 


(7.4.49) 


0;(A) + 02(A) + +++ +0,(A) S$ 0)(B) + +--+ +09(B) 


_ Proof: The key observation required is that a unitarily invariant norm on 
` Mm,n isa symmetric gauge function of the singular values of its argument 
(7.4.24). The sufficiency of (7.4.48) requires only the fact that a sym- 
- metric gauge function is a monotone norm (5.5.10), while the more subtle 
-assertion about the inequalities (7.4.49) is the content of the preceding 
theorem. O 


In order to apply Corollary (7.4.47) to prove norm inequalities, it is 
frequently useful to have the following restatement of the fact that the 
vector of eigenvalues of a sum of Hermitian matrices majorizes the sum 
of the respective vectors of ordered eigenvalues. 


7.4.50 Lemma. Let A, Be M,, be Hermitian matrices with ordered eigen- 

values A (A) S --- < Aa (A) and \,(B) Ss --- <),,(B), and let \,(A-—B) s 

+++ <),(A—B) denote the ordered eigenvalues of A — B. Then the vector 
\(A) — d(B) = [\i(A) —A1(B)] 

g(x) = g(|x]) . 

majorizes the vector MA — B) =[\;(A—8)]; that is, 


N n N 
= o(§ = iPi xs iP; = i == k k 
e(SlyD=e( X aiPilvl)s X eaiPily) = X glyd =e) anf $ Dasic ciere BMA-B 
j=l i=] 


=g(y) 


because g(+) is an absolute vector norm that is a permutation-invariant 
function of the components of its argument. O 


for k=1,2,...,n, with equality for k =n. 


Proof: Theorem (4.3.27) says that the vector \(A)=A((A—B)+B)= 
[\;((A—-B)+B)] of eigenvalues of (A—B)+B=A majorizes the vector 
MA — B)+X(B) = [\;(A— B) +;(B)], which is equivalent to having the 
vector \(A)—)(B) majorize the vector \(A—B). 0O 


The significance of the theorem is that in order to have |A] < |B| for 
every unitarily invariant norm |+| on Mm,» it is necessary and sufficient 
that this inequality hold for the Ky Fan k norms, k =1,2,..., min pn, n}. 


448 Positive definite matrices 


With the conditions in (7.4.47) and the preceding lemma as tools, it is 
often possible to generalize approximation theorems or inequalities for 
the Frobenius norm or the spectral norm to the whole class of unitarily 


invariant norms. l l l 
For example, (7.4.15) says that if A, Be M m,n are given matrices with 


respective ordered singular values o (4) = = =04(4)2=0 and o,(B)2 
- > 04(B) 20, with q =min{m, n}, then 


q 1/2 
|A—Bh2= ( y [oA)—0(B))) 


Another way to write this lower bound is 
|A—Bl2= |E(4)-E(B)|2 


where A = V £(A)W;" and B = V, E(B) Wy are singular value decomposi- 
tions in which the respective singular values are ordered from largest to 
smallest on the “diagonal” of £(A) and E(B). Another instance of this 
inequality, for the spectral norm, is (7.3.8(a)). It is in this form that 
(7.4.15) generalizes to all unitarily invariant norms. 


7.4.51 Theorem. Let A, Be Mm,n be given matrices with singular value 
decompositions A = V; D(A) Wf and B= V E(B) Wy with unitary Vj, 2e 
M,, and unitary W,,W.¢M,, and in which the “diagonal” elements of 
both £(A) and E(B) are arranged in decreasing order. Then |A- B| 2 
|E(A)—E(B)| for every unitarily invariant norm |e] on Mi, n- 


Proof: Let q =min{m, n}. Use (7.3.7) to identify the singular values of A 
0\(A) 2 +++ 2a,(A) 20 
with the first g nonpositive eigenvalues of the Hermitian matrix 
A= l 0 A 
A* 0 
of which the m+n ordered eigenvalues are 


~5,(A)<0=-" 


Je Mune, 


—0;(A)S —-0,(A)S°:° Ss =0<0,(A)s ++: $0,(A) 


and similarly for B and A—B. The differences of the ordered eigenvalues . 
of A and B are +[0;(A)—0,(B)],..., t[og(A)— 94(B)] together with 0 | 
(|m—n| times). Although it is not clear how to order this sequence in | 
general, the q smallest elements in an ordering of this sequence are 
{—|o;(A)—;(B)|}, and Lemma (7.4.50) applied to A, B, and A—B- 


assures us that 


7.4 Applications of the singular value decomposition 449 


k k 
5 -o(4-B)<minf È -|o (A)-0; (B)|:1si <- <in) 
j=l 


i=l 


for k=1,...,q, which is equivalent to 


k k 
5 o(A~B) = max} > |o;,(A)— 0; (B)|:1s i< tee 
iz] 


j=l 


<in) 


k=1,...,q. Since {[o;(4)—o;(B)|} is the set of singular values of 
XE(A)— E(B), Corollary (7.4.47) guarantees that |A— B| > |E(4)-E(B)| 
for any unitarily invariant norm |s|. O 


7.4.52 Example. One consequence of Theorem (7.4. 51) is a general- 
ization of the problem of finding a best (in the sense of least squares) 
rank k approximation to a given matrix A € M,,, considered in Example 
(7.4.1) for the Frobenius norm. If |s] is a unitarily invariant norm and 
if BeM, has rankk, then o,(B)=--- =0,(B) >0=0,4,;(B)=-- = 
o,(B). Thus, 


|4-B|=|E(4)—E(B)| 
= |diag(o,(A) — 0;(B),... 
> |diag(0,..., 


,ok(A)—o(B), ok+1(4), -3 On (A))]] 
0, 0x41(A), sees o,(A))| 


where we have used the fact that a unitarily invariant norm on diagonal 
matrices is a monotone norm because it is a symmetric gauge function of 
the diagonal entries. Furthermore, equality is possible for B=VE W*, 
where A =VE(4A)W* is a singular value decomposition for A and E = 
diag[o, (A), eng 0,(A), 0, sees 0]. 

Thus, for any A € M, and any Be M, of rank k, we have the bounds 


|A—B| = |diag (0, ..., 0, 0441(A), s., o, (A))| 
= 0,(A)|diag(0,...,0,1,...,1)| 


for any unitarily invariant norm (there are k zero terms on the diagonal 
of the last expression) in which the first inequality, but not generally 
the second, is sharp. The second inequality (which follows solely from 
monotonicity of symmetric gauge functions if A is nonsingular and is 
trivial if A is singular) has the advantage that its dependence on the norm 
is a function of k only, and not of A. In particular, this says that for any 
nonsingular matrix A € M, and any unitarily invariant norm ||, we have 


_ the sharp bound 


|A-B| = 0,(A) |diag(0, ..., 0, 1)| (7.4.53) 


450 Positive definite matrices 


for the distance between A and any singular matrix B; that is, the mini- 
mum distance from A to the closed set of singular matrices (with respect) 


to the unitarily invariant norm ||) is 0,(A)|diag(0, ...,0,1)). 


7.4.54 Example. Properties of symmetric gauge functions can be ex-] 
ploited to give a simple characterization of the unitarily invariant norms‘ 
on M, that are matrix norms. If ||e|| is a unitarily invariant matrix norm 


on M,, then we know from Corollary (5.6.35) that ||A]] = 0)(A) for all 


AeéM,,. Using Theorem (5.6.9) and the fact that every unitarily invariant 1 
norm on M, is self-adjoint (see Problem 2), we can also prove this directly } 
|AP?. On] 
the other hand, let |+| be a unitarily invariant norm on M, such that} 
|| A || = o,(A) for all A e M,, and let g be the symmetric gauge function on | 
C” generated by |||]. The multiplicative analog for singular values of 
Weyl’s inequalities given in Problem 18 of Section (7.3) and the fact that gi 


by observing that [0)(A)]° = p(A*A) < || A*A] < A*I A] = 


is a monotone norm imply that 
|| AB || = g(0(AB), o2(4B), ..., o (AB)) 
< g(0;(A) 0; (B), 0)(A)o2(B),..., 01(A) on (B)) 
= 0)(A) g(0;(B), 02(B),..., 0n(B)) 
= (4) |B| < |AB| 


Thus, a unitarily invariant norm ||«|| on M, is a matrix norm if and only- 
if |A|] =o(4)= [|All]. for all Ae Mn. In particular, all the Ky Fan k 4 
. n and all the Schatten p-norms for pz1 [ger 


norms for k=1,2,.. 
erated by the symmetric gauge functions in (7.4.4) and (5.2.4), respec- 


tively] are matrix norms. Another consequence of this characterization is 


that the set of unitarily invariant matrix norms on M,, is convex. The set 


of all matrix norms on M, is not convex [see Problem 9 of Section (5.6)]. 


Problems 


1. Let AE Mm,n have rank k>0. Suppose it is desired to find a matrix 1 
A€ Mm,» that has rank kı <k and most closely approximates A in the 
Frobenius norm. Show that this can be done as follows: Let A=VIW* | 
be a singular value decomposition of A. Let E, be the same as © except | 
..., Ok, are used; the remaining n — kı “diagonal” entries of 2, 1 
VE, W* has the required property. Hint: Use (7.4.15). 
Notice that (7.4.52) shows that the given approximation is “best” not only $ 


that only o, 
are zero. Then A; = 


for the Frobenius norm, but for all unitarily invariant norms as well. 


2. A norm |-| on M, is said to be self-adjoint if |A|=|A*| for every 
AeM,. Use Theorem (7.4.24) to show that every unitarily invariant 


| where o2 
f are the absolute values, respectively, of the eigenvalues of A of largest 
' and smallest absolute value, and that 


7.4 Applications of the singular value decomposition 451 


a norm on M, is self-adjoint. Given an example of a self-adjoint norm that 
T js not unitarily invariant. 


T 3. Use Theorem (7.4.10) and the methods of Example (7.4.6) to deter- 
F mine a best least-squares approximation to a given matrix A € M m,n (with 
| mzn) by a scalar multiple of a matrix Y € Mpm,n with orthonormal rows. 
F Hint: Show that such a matrix Y must have the form Y= VDW, where 
4 VeM,, and WeM, are unitary, D = [7:0] Na n 1EM,,, and 0e 
T Mn,n-m. The problem of minimizing |A— cY|3 i is the same as the prob- 
T lem of minimizing |A |3- (Re tr AY*)?/m. If A=V, OW? is a singular 
" yalue decomposition of A, show that this minimization problem requires 
© finding 


max Re tr{£2WD*V: We M, and VeM,, are unitary} 


i and use Theorem (7.4.10) to solve this problem as in Example (7.4.13). 
Show that the value of the error in this case has the same form as in 
T Example (7.4.6). 


4 4. Consider diagonal matrices A, Be M, to show that all possible per- 
j mutations 7 can arise in (7.4.11). 


l 5, Consider the function u(A) defined in (7.4.7). Show that 


u(A)synlAl. forall AeM, 


t and that this bound is sharp. Use the definition to show directly that 


u(A) is a vector norm on Mp, and explain why u(A) is actually a matrix 


í norm on M,,. Hint: See Example (7.4.54). 
T 6. Show that if AeM, is nonsingular and if «(A)= Alla JAW Ifo is 


the condition number of A with respect to the spectral norm, then «(A) = 


0; /0n, the ratio of the largest and smallest singular values. How does this 
" compare with the estimate «(A) = |i /An|? 


3 4. Show that the constant in Kantorovich’s inequality (7.4.42) is the 
| square of the ratio of the geometric mean of ^; and A, to the arithmetic 
~ mean of M and àp. 


8. Let Ae M, be nonsingular and Hermitian. Use Kantorovich’s inequal- 
-ity (7.4.40) to show that 


Axl l4 xls ofto 1 /o on 
on |x (3 Ooi. =5 (31+) 


-- >0,>0 are the singular values of A. Show that o; and op 


452 Positive definite matrices 
| a as o 
A : ae = glt 
2 On gl 


where x is the spectral condition number of A. Exhibit a vector x for which 
the maximum is achieved. Use the definition of the spectral condition i 
number and its relation to the above-defined maximum to explain why 
one must have 


Lo LO eee 

2 \on Gj On 
Prove this inequality directly. Hint: Show that f(x) =x —[x+ (1/x)]/2 is 
an increasing function for x =1. 
9. Let `i, A2,- An be n given positive real numbers. Use Kantorovich’s 


inequality (7.4.42) to prove that if a,...,a, are nonnegative and sum 
to 1, then 


4 T Qi (Amas + ere 
iA 7 = 
(= Ş 1e Ni 4 max Amin 


(7.4.42) due to Greub and Rheinboldt: Let B, Ce M, be commuting pos- 


itive definite matrices with eigenvalues 0<),<---<), and 0< py Ss 


"++ Sp,, respectively. Then 
AAI Nn Hi Hn 
(At mit An Hn) 


for all xe C”. Hint: Since B= UAU* and C = UMU* for some unitary 
Ue Mn, write the asserted inequality first in terms of y = U*x and then in 


(x*BCx)* = z (x*B?x) (x*C*x) 


terms of z=(AM)'/"y, Then apply (7.4.41) with B= AM `! to show that | 


the asserted inequality holds (and is sharp) with a constant of the form. 


Ai An bj Mk 
Or pj tn uk)? 


for some choice of indices 1 < j#k <n. Show that the least constant of | 


this form occurs for j=1 and k=n. This final generalized inequality ` 
might not be sharp, however. 


11. In contrast to the Kantorovich inequality (7.4.42), show that 
(x*Bx)(x*Bo'x) = |x|} 

if Be M, is positive definite. More generally, show that 
(x*Bx)(y*Bo'y)=(x*y)? for all x, yeC” 


forall xec” 


if Be M, is positive definite, with equality for x= B™! y. Conclude that | 


[A+ An)/2]° 


ee (x*x)? for all xec” 


(x*x)* < (x*Bx)(x*B!x) < 


7.4 Applications of the singular value decomposition 453 


| Hint: Use the Cauchy-Schwartz inequality to show that 


2 


12. Let Be M, be positive definite, let ye C” be any nonzero vector, and 
define 


2 (Vix) ( FE) 


n 
X x 
i=] 


if 4; >0, and then write B = UAU *. 


*B. 
AB, 9) = min) > :xeC” and xy] 
x*y 


Show that (B, y) is well defined and use Problem 11 to show that 


 J(B, y) =1/y*B™'y. Show that f has the superadditivity property 


J(A+B, y) = f(A, y)+f(B, y) 


T for all nonzero ye C” and all positive definite A, Be M,,. Now let y =e; 
® the /th standard unit basis vector, and deduce Bergstrom’s inequality 
10. Prove the following generalization of Kantorovich’s inequality $ 


det B 
det B; f 


det(A + B) = det A 
det(A;+B;) det A; 


(ee 


valid for any positive definite matrices A,BeM,, where A; e M,,_.; de- 
notes the principal submatrix of A obtained by deleting the ith row and 


column of A, and similarly for B;. This approach to Bergstrom’s inequal- 
_ ity is an example of a widely useful technique known as quasi-lineariza- 


tion: express a nonlinear function of a quantity of interest as the con- 


strained extremal value of another function that depends linearly (or 
_ perhaps only additively) on the quantity of interest. The crucial step in 
| (7.4.24) (proving that the pre-norm on Mm,» defined in terms of a sym- 
_ metric gauge function of the singular values is actually a norm) was 
€ accomplished with the quasi-linearization in (5.4.12). 


13, For any complex number z, the inequality |z—Re z| < |z—x| holds 


for all real numbers x. A plausible generalization of this to square ma- 


| trices A e M, is 


|A-}(A+A*)] <]A-A| 


_ for all Hermitian matrices H e M,,. Prove that this inequality holds for 
_ all unitarily invariant norms |+] and, more generally, for all self-adjoint 
< norms. Conclude that the distance (with respect to |-|) from a given ma- 
| trix Ae M, to the closed set of Hermitian matrices in M, is 3|A-A*|. 


Hint: A—3(A+A*) = 3(A—H)+3(H-A*), so |A—4(A4A*)] < 


E 214-A]+3|A-A'. 


454 Positive definite matrices 


14. For any complex number z, we have the inequality |Re z] < |z| 
Demonstrate the trivial generalization |(A+A*)/2| < |A| for all AeM 
and for all unitarily invariant (even self-adjoint) norms |e]. 


15. Let Aec M, be given, let M < --- < Ap be the ordered eigenvalues o 
i(A +A*), and let o; > +> =o, be the ordered singular values of A. Ex 
plain why the inequalities 


An-k+1 ($ [4+4*]) < 0%(A), 


may be thought of as a generalization of the inequality Re z < |z| fo 
complex numbers. This says that the kth largest singular value of A i 
greater than or equal to the kth largest eigenvalue of 4 (A+A*). Hint: I 
y is a Euclidean unit vector, then 


k=1,...,n 


i y*(A +A*)y=Re y*Ay s|Ay|2 


Use the Courant-Fischer theorem (4.2.11) to express \,—«+41, and the 
use this inequality and (7.3.10) to get ox. 


16. Let Ae M, be given, and let |e] be a unitarily invariant norm on 


M,,. Use (7.4.51) to show that |A—U|2|2£(A)—J] for any unitar 
Ue M, and that this inequality is sharp. Conclude that |2(A)—J| is th 
distance (with respect to ||) from A to the compact set of unitar 
matrices in M,. 


17. Let A€ M, have a singular value decomposition A = VL(A)W* and 


let [+] be a unitarily invariant norm on M,. Show that 
|2(A)-1] s|A-U] s|2(A) +7] 


for any unitary Ue M,. Hint: Show that £(U) =I in any singular valu 
decomposition of any unitary matrix U, so that the lower bound follow 
immediately from (7.4.51). For the upper bound, use the singular valu 
analog of Weyl’s additive eigenvalue inequalities in Problem 16 in Sec 
tion (7.3) to show that 0;,;-,(A+(—U)) s0;(A)+0;(—U). Then us 
(7.4.48). 


18. With the inequality in Example (7.4.53) for nonsingular A as 
guide, find a sharp lower bound for |A—B|, where Ae M,, is a give 
matrix of rank kı, Be M, is an arbitrary matrix of rank k < ky, and |+]i 
a unitarily invariant norm. 


Further Readings. The original version of Theorem (7.4.24) in the cas 
m=n is due to Von Neumann; see the paper cited in Section (5.4). Th 
treatment given of the inequalities of Wielandt and Kantorovich i 
adapted from [Hou 64], which has many references to the original papers 


7.5 The Schur product theorem 455 


For generalizations and further references see A. Clausing, “Kantorovich- 
Type Inequalities,” Amer. Math. Monthly 89 (1982), 314-320. The ap- 
proach to Bergstrom’s inequality in Problem 12 is taken from [BB], 
which has a long chapter (with copious references) devoted to inequal- 
ities arising from positive definite matrices; there is also a discussion, and 
many examples, of the method of quasi-linearization. For more infor- 
mation about inequalities valid for all unitarily invariant norms, see 
L. Mirsky, “Symmetric Gauge Functions and Unitarily Invariant Norms,” 
Quart. J. Math. Oxford 11(2) (1960), 50-59 as well as K. Fan and A. J. 
Hoffman, “Some Metric Inequalities in the Space of Matrices,” Proc. 


Amer. Math. Soc. 6 (1955), 111-116. As an example of how these results 
are applied in statistics, and for further references to the statistics litera- 
ture, see C. R. Rao, “Matrix Approximations and Reduction of Dimen- 
sionality in Multivariate Statistical Analysis,” Multivariate Analysis-V, 
Proceedings of the Fifth International Symposium on Multivariate Analy- 
sis, P. R. Krishnaiah, North-Holland, Amsterdam, 1980, pp. 1-22. 


7.5 The Schur product theorem 


A particularly simple (and seemingly naive) composition of matrices is 
component-wise multiplication. 


7.5.1 Definition. If A = [a;;] € Mm,n and B= [b;;] E€ Mm,n are given, 
then the Hadamard product of A and B is the matrix A» B = [a;;b;;] € 
Min, n- 


The Hadamard product is often called the Schur product. Like matrix 


addition, Hadamard multiplication is commutative, and it is considerably 
simpler than the usual matrix multiplication. 


The Hadamard product arises naturally from several different points 


of view. For example, if f(@) and g(@) are continuous periodic functions 
of period 2a and if 


an, Qn, 
a= ei! (6) dO and b= e" 9(0) do 


k=0, +1, +2,..., then the convolution 


an 
n(ay=\" f(O—1) g(a) at 


has trigonometric moments 


an, 
= K e’ h(0) do 


456 Positive definite matrices 7.5 The Schur product theorem 457 


If K(x,y) and A(x, y) are both continuous positive semidefinite 
kernels on the same finite interval [a, b], then H(x, y) also has an abso- 
lutely and uniformly convergent representation 

S yix) vi 
i=] i 


that satisfy the identities c =a,b,, k=0, +1, +2,.... Thus, the Toeplitz 
matrix of trigonometric moments of A(0) is the Hadamard product of 
the Toeplitz matrices of trigonometric moments of f(@) and g(@): 


[e] = [a j] lbj] 
If f(@) and g(@) are both nonnegative real-valued functions, then the 
convolution A(@) is also a nonnegative real-valued function. Therefore, 
as shown in (7.0.5), the matrices [a;_;], [b;—;], as well as [c,— ;] are posi- 
tive semidefinite. This is an instance of the Schur product theorem: The 
Hadamard product of two positive semidefinite matrices is positive semi- 
definite. 
As another example, consider the integral operator 


on [a,b]x[a, b] 


with all 4; > 0. By direct multiplication of the respective series, the (point- 
wise) product kernel L(x, y) = K(x, y) H(x, y) has the representation 
d(x) ¥/(x) POL) 


L(x, y)= 5 on la, b| x [a, b] 
j=l Ni By 


which also converges absolutely and uniformly. Then 


b 
K =| KO») fay 


b pb æ 2 
| LEENA dedy= $ — | ayd] =0 


1 
ij=t NiMj 


where the kernel K(x, y) is a continuous function on the finite interval 
[a,b]x[a,b] and feC[a,b]. If one has a second kernel H(x, y), 
then one could consider the (point-wise) product kernel L(x, y) 
K(x, y) H(x, y) and the associated integral operator 


so L(x, y) is also positive semidefinite. This is another instance of the 
Schur product theorem. 


Exercise. Show that the Hadamard product of two Hermitian matrices is 
always Hermitian, but that the usual matrix product of two Hermitian 


b b 
Lif)=| L(x, ») Sv) dy =| K(x, y) H(x, y) f(y) dy a ae . 
a a matrices is Hermitian if and only if they commute. 


The linear mapping f— K(/) is in a natural way a limit of matrix-vector 
multiplications (approximate the integral as a finite Riemann sum), and 
many properties of integral operators can be deduced by taking appro-. 
priate limits of results known for matrices. The (point-wise) product of | 
integral kernels leads to an integral operator that is, from this point of: 
view, a natural continuous analog of the Hadamard product of matrices. : 

If the integral kernel K(x, y) has the property that 


Exercise. Consider the matrices A= É i] and B= f 3]. Show that 


A, B, and AB are positive semidefinite, but that the usual matrix product 
AB is not positive semidefinite. Show that the eigenvalues of AB are pos- 
: itive, however. 


The main reason for introducing the Hadamard product at this point 
is that it (unlike the usual matrix product) leaves invariant the cone of 
positive semidefinite matrices and provides yet another analogy between 
positive semidefinite matrices and nonnegative real numbers. We begin 
. with an observation of independent interest. 

Any matrix may be written as a sum, of rank 1 matrices, in which the 
number of summands is equal to the rank, but for a positive semidefinite 
matrix the summands may also be chosen to be positive semidefinite. 


b tb 
\, \, K(x, y) f(x) f(y) dx dy = 0 


for all feCl[a,b], then K(x, y) is said to be a positive semidefini 
kernel. It is a classical result (Mercer’s theorem) that if K(x, y) is a con 
tinuous positive semidefinite kernel on a finite interval [a, b], then ther 
exist positive real numbers {A;} (known as “eigenvalues”) and continuou 
functions {@;(x)} (known as “eigenfunctions”) such that 


_ 2 ;(X) pi) 7.5.2 Theorem. If Ae M, is a positive semidefinite matrix of rank k, 
K(x, y)= x x, on [a,b)x[a, 5] then A may be written in the form 


and the series converges absolutely and uniformly. A =u HUV + ++ HULU; 


458 Positive definite matrices 7.5 The Schur product theorem 459 


where each v; e C” and the set {v;,..., vg} is an orthogonal set of nonzero. 
vectors. 


Exercise. Consider the matrices A = [ | and B= | o o] and show that 
rank AeB can be zero even when A and B both have positive rank. 

Proof: Use the spectral theorem to write A= UAU* and let v; be dy ; 
times the ith column of V. O 


Exercise. Show that if A is positive definite and B is negative definite, 
then AoB is negative definite. 


Our main result is often called the Schur product theorem. 
5,4 Corollary (Fejer’s theorem). Let A = [a;;] € Mp. Then A is pos- 


we . . . itive semidefinite if and only if 
7.5.3 Theorem. If A, Be M,, are positive semidefinite matrices, then y 


AcB is also positive semidefinite. Moreover, if both A and B are positive 
definite, then so is AeB. : 


a 
5 a;;b;; 20 
j=l 
, . . . for all positive semidefinite matrices B=[b;;]e€M,. 
Proof: Use (7.5.2) to write A=v vi +--+: +u,vg, and B=wywy ++: 


WmWm, Where k=rank A and m=rank B. Observe that Proof: Suppose A and B are positive semidefinite, and let xe C” be a 


k,m . vector with all components equal to 1. Then AcB is positive semidefinite, 
AB= 2 U;jUij and the indicated sum is just x*(AeB)x, which much be nonnegative. 


, _ Conversely, if © a;;b;;=0 whenever B is positive semidefinite, just set 
where uj; =v;°w;. Thus, AeB is positive semidefinite because it is a su B=(b;;]=[X;x;] for any given vector xe C”. Then B is positive semi- 
of (rank 1) positive semidefinite matrices. 


If A and B are both positive definite, then k =m =n and the sets fv 


n 
and {w;} are both orthogonal bases of C”. If AeB were singular, ther 5 ajby= 5 Ajj ix; =x*Ax =0 
would be some nonzero vector x such that (A°B)x =0 and hence ij= i,j=l 
k,m kım Since x € C” was arbitrary, we conclude that A is positive semidefinite. © 
x*(AcB)x= DY x*(ujui)x= Ð |x*u;;|? =0 
i j=l j=l 


, 7.5.5 Application. Let DC R” be an open bounded set. The second- 
But then each term must vanish separately, and hence 


ža, = ly*ínow) 2 = op )*w.l? = n n 
|x*uj |? = |x*(vjew,)|° = |(xed,)*w;|° = 0 Lu= 3 a(x) L2 YS a(x) oe teu (7.5.6) 

for all i and j. This means that for each / the vector xed; is orthogonal t j=l Ox; OX; j= 1 Ox; 
all the vectors Wi, W2,...,W, and therefore xed;=0 for all i=1,2,...,n 
In particular, this implies that vjx =0 for all i=1,.... Since this mean 
that x is orthogonal to all the elements of a basis, we must have x =0. W 


conclude that AeB must be nonsingular. L] 


s said to be elliptic in D if the matrix A(x) = [a;;(x)] is positive definite 
or all xe D. Suppose there is some u € C?(D) that is continuous on the 
: closure of D and satisfies the equation Lu=0 in D. What can we say 
` about the local maxima or minima of the function u in D? Suppose y e D 
isa local minimum for u. Then 


Exercise. Let A, B e M,,. Use the proof of (7.5.3) to show that rank AeB 


(rank A)(rank B) whenever A and B are positive semidefinite. In pa ou =0 forall j=1,2,...,7 
ticular, show that if (rank A)(rank B) <n, then AeB must be singular. OX; |y 
. . . . . and the Hessian matrix 
Exercise. Show that the assertions of the previous exercise are still co 
rect when A and B are Hermitian matrices that are not necessarily pos l 3u l 
tive semidefinite. OX; OX; 


460 Positive definite matrices 7.5 The Schur product theorem 461 


is positive semidefinite at y. Therefore, 


n 2 

Lu=0= $ a;-—— 

j=l 4 Ox; OX; 

at the point y, and by Fejer’s theorem (7.5.4) the sum involving the | 
second derivatives must be nonnegative. Thus, the term c(y)u(y) must | 
be nonpositive. In particular, if c(y) <0, then it cannot be that u(y) <0. 
A similar argument shows that u(y) cannot be positive at an interior _ 
relative maximum y if c(y) <0. These simple observations are the heart — 
of the following important principle. 


A final corollary of the Schur product theorem is easily proved. If 
A= [aij] € M, is positive sernidefinite, then Ae A = [a7] is also positive 
semidefinite. By induction, it follows that all the positive integer Hada- 
mard powers [a$] are positive semidefinite for all k =1,2,.... Since any 
nonnegative linear combination of positive semidefinite matrices is posi- 
tive semidefinite (7.1.3), this implies that 


+CH 


m times 
| an 


dol +a A +a AA+ tt +4,,Aorr-oA = [ao + aya); +a + ee + G4}? 


= [p(a;)] 


is positive semidefinite whenever all a; >= 0; p(x) = aotax + +++ +a,,x"" 
1 


7.5.7 Weak minimum principle. Let the operator L defined by (7.5.6) _ wpe 
is a polynomial with nonnegative coefficients. More generally, if 


be elliptic in D, and suppose that c(x) <0 in D. If u € C?(D) satisfies 
Lu =0in D, then u cannot have a negative interior relative minimum nor 
a positive interior relative maximum. If, in addition, u is continuous on 
the closure of D and u is nonnegative on the boundary of D, then u must 
be nonnegative everywhere in D. 

From the minimum principle follows one of the fundamental unique- 
ness theorems for partial differential equations: 


fe)= $ az! 
k=0 


is an analytic function with all a, =0 and radius of convergence R>0 
then a simple limiting argument shows that [ f(a; )]eM,, is positive semi- 
definite whenever all |a;;| < R. Perhaps the simplest example is f(z) =e’, 
whose power series converges for all ze C and whose coefficients are 
a, =1/k! >0. By this argument, [e^] is a positive semidefinite matrix 
whenever A =[a;;]€M, is positive semidefinite. This result can be im- 
: proved; weaker conditions on A are sufficient to ensure that the entry- 
_ wise exponential of A is positive semidefinite. See [HJ ]. 


7.5.8  Fejer’s uniqueness theorem. Suppose the operator L defined by 
(7.5.6) is elliptic, assume that c(x) <0 in D, and consider the following 
boundary value problem: 


Lu=f in D, fa given function 
u=gon dD, g a given function 
u is twice continuously differentiable in D 
u is continuous on the closure of D 


7.5.9 Corollary. Let A = [a;;] € M, be positive semidefinite. Then 


(a) The matrix [af] is positive semidefinite for all k=1,2,.... 

(b) If f(z)=antazt+az74+- is an analytic function with non- 
negative coefficients and radius of convergence R >0, then the 
matrix [ f(@,;)] is positive semidefinite if all laij|<R. 


Then there is at most one solution to this problem. 


Proof: If u; and u, were two solutions to this problem, then the func- 
tion v=u,—wu is a solution to a problem of the same type, but with 
zero boundary conditions and Lv =0 in D. By the weak minimum prin- 
ciple, v must be nonnegative in D. Applying the same argument to 
—v, we find that v must also be nonpositive in D as well, and hence v =0 
in D. 


Problems 


L. Show that if H(A) (the Hermitian part of A) is positive definite and B 
is positive definite, then H(AeB) is positive definite. 


2. H A= [a;] e€ M, is positive semidefinite, show that the matrix [la 1] 


Exercise. Explain how the weak minimum principle and the Fejer unique- is also positive semidefinite. Hint: Consider AoA. 


ness theorem apply to the partial differential equation V7u—du=0 in 
DCR", where \ is a positive real parameter. 


3. If A= laij] € Mn is positive semidefinite, show that the matrix 
[e® “A] is positive semidefinite for all `e R. 


462 Positive definite matrices 


4. If A=[a;;]eM, is positive semidefinite, then the positive integer 
Hadamard power matrices A“? and the Hadamard absolute value squared 
matrix AoA are always positive semidefinite. But what about the Hada- 
mard absolute value matrix |A|=[|q;;|]? (a) Suppose A € M, is positive 
definite. For n =1, 2,3 use the determinant criterion (7.2.5) to show di- 
rectly that |A| is positive definite. Obtain the result for positive semidefi- 
nite A (n =1, 2, 3 only) by passing to the limit. (b) Use the fact that f(x) = 
cos(x) is a positive definite function [or write cos(x) = (e™ + e~™)/2 
and compute the quadratic form explicitly] to show that the matrix A= 


[cos(x;—x;)] is positive semidefinite for all choices of {x;x2,...,X,JER 


and for all 7=1,2,.... (c) Let n=4, and let x; =0, x. = 7/4, x3=7/2, 
and x4= 37/4. Explicitly compute the (necessarily positive semidefinite) 
matrix A in (b) in this case, and observe that it is a Toeplitz matrix. Com- 
pute |A| and det|A|, and show that |A| cannot be positive semidefinite. 


5. Consider the matrix |A| in Problem 4, and show that B =|A|»|A| is an 
example of a positive semidefinite matrix whose nonnegative “Hadamard 
square root” is not positive semidefinite. Contrast this with the situation 
for the ordinary square root B". 


6. Consider the matrix A e M, given by 


10 3 -2 1 

| 3 10 0 9 
A=| 2 o 10 4 
1 9 410 


Show that A is positive definite but that |A| is not positive semidefinite. 


7. Let K(x, y) be a continuous integral kernel on the finite interval 
[a,b]. Show that K(x, y) is a positive semidefinite kernel if and only if 
the matrix [K(x;, x;)] € M, is positive semidefinite for all choices of the 
points {x;}7_,C [a,b] and all n=1,2,.... Hint: To show that the ma- 
trix condition is sufficient, consider a Riemann sum approximation to the 


integral 


> 


i,j = 


bb 

J KOTO) dx dy= Z Kax fT) Axi Ox; 
ava 

To show that the matrix condition is necessary, consider a function 


fosg a; (x —x;) 


463 


7.5 The Schur product theorem 
where 6,(x) is an “approximate delta function,” which is continuous and 
nonnegative, vanishes identically outside the interval [—e, e], and satisfies 


i 5.(x) dx=1 


Now let «0. 


. Use Problem 7 and the Schur product theorem to show that the 
point-wise) product of positive semidefinite integral kernels is positive 
emidefinite. This line of argument is relatively elementary and does not 
equire Mercer’s theorem from the theory of integral equations. 


. Show that a function ¢¢€ C(R) is a positive definite function [Prob- 
em 8 in Section (7.1)] if and only if K(x, y)=¢(x— y) is a positive semi- 
efinite integral kernel. 


0. Show that the product (¢;¢2)(x) of two positive definite functions 
1(X), @2(X) is a positive definite function. 


1. Explain why all the functions 


| 


-00 


T ix 
e” dt, 
-T 


sin(Tx) _ 1 


Tx 2T T>0 


(a) 


oO 


(b) e7% e` Pe dt 


1 
=z] 


| 


s well as all their mutual products are positive definite functions. 


itx 
€ 


-o 1+2? 


20 


(c) e7 = I dt 


T 


2. Use Problem 11(c) to give an alternate proof that the matrix in 
roblem 12 of Section (7.2) is positive definite. 


3. If A= [a;]e€ M, is positive semidefinite, show that the matrix 
a;j/(i+j)] is also positive semidefinite. Hint: Problem 17 of Section 
7.1). 


14. Let Ae M, be positive semidefinite. Show that xe C” satisfies 
_x*Ax = Oif and only if Ax = 0. If Ais merely Hermitian, show by example 
that it is possible to have x*Ax =0 and Ax #0. Hint: Write A= UAU*, 
so x*Ax = 0 if and only if $ ),|z;|7=0, where z = U*x. 


15. A convex cone (with vertex at the origin 0) is a convex set S such that 
: the ray {\x:X20}CS for all xe S. A ray {\x: = 0} in a convex cone S 
is an extreme ray if x=ay+(l—a)z for O0<a<1 and y,zeS only if 


464 Positive definite matrices 


both y and z lie on the ray; equivalently, a ray of a convex cone is 
extreme if it can be deleted from the cone and the resulting cone is still 
convex. Show that a ray {\A:)= 0} in the convex cone of positive semi- 
definite matrices in M, is an extreme ray if and only if A has rank I. 
Theorem (7.5.2) then says that every positive semidefinite matrix is a 
convex combination of matrices that lie on extreme rays. Hint: (a) If 
xeC" is nonzero, and if xx*=aA+(1—a)B for some a e (0, 1) and pos- 
itive semidefinite matrices A, Be Mp, let {Xi -3 Xn} cC” be an ortho- 
normal set such that x*x, =0 for k=2,..., n. Then 0 = xf AX, = X% BXk, 
and hence each x, is in the null space of both A and B by Problem 14. 
Conclude that both A and B have rank 1 and are positive scalar multi- 
ples of xx*. (b) If Ae M, is positive semidefinite and has rank k = 2, use 
(7.5.2) to write A=B+C, where B=vv*, v#0, rank C21, and Cu=0. 
Conclude that C is not a scalar multiple of B and hence A does not lie on 


an extreme ray. 


7.6 Congruence: products and simultaneous diagonalization 


Unlike multiplication of positive real numbers, the usual matrix multipli- 
cation does not always preserve positive definiteness. The product of two 
Hermitian matrices may not even be Hermitian (it is Hermitian only 
when they commute), and the quadratic form generated by the product 
may not be nonnegative. The particular focus of this section is on posi- 
tive definite matrices; for more general results about Hermitian matrices 


see Section (4.5). 


7.6.1 Example. Let A= E =i] and B= É | so that A and B are 


positive definite. But AB= [3 il is not symmetric, and not even 


H(AB)= lo nm is positive definite. 


At least one positivity property is retained by the usual matrix product 


7.6 Congruence 465 


7.6.3 7 Theorem. The product of a positive definite matrix Ae M, anda 
Hermitian matrix Be M, is a diagonalizable matrix, all of whose ei en- 
values are real. The matrix AB has the same number of positive ne a- 
tive, and zero eigenvalues as B. Furthermore, any diagonalizable ‘matrix 


with real eigenvalues is th iti 

e product of a positive definite IX 
s * s m t i 
Hermitian matrix. amix and a 


Proof: For the first part, note that A T2 ABA!” = A'? BA!Ż, so the latter 
matrix is similar to AB and hence has exactly the same eigenvalues Since 
A” is Hermitian, the matrix A'/?BA'” is congruent to B. Thus, by Syl- 
vester’s law of inertia (4.5.8), the eigenvalues of B have the same set of 
signs as those of A'?BA"?, and hence of AB. Moreover, since A"? BA'⁄2 
is Hermitian, it is diagonalizable, and hence AB must also be diagonal- 
izable. For the last assertion, suppose Ce M, is diagonalizable and has 
only real eigenvalues: C=SDS~', with D a real diagonal matrix. Then 
C=S(S*S*~')DS~! =(SS*)(S~'*DS~') = AB where A =SS* is ositi 

definite and B= S~'*DS~' is Hermitian. © Posave 


Simultaneous diagonalizability of two matrices by similarity is a rare 
event, requiring the strong joint assumption of commutativity. Simul- 
taneous diagonalization of two Hermitian matrices by joint *congruence 
however, requires much less. Simultaneous diagonalization by *congru- 
ence corresponds to transforming two Hermitian quadratic forms into a 
linear combination of squares by a single linear change of variables. Th 
following result is classical; for a generalization see (4.5.15). S 


ee Theorem, Let A, Be Mn be two Hermitian matrices and suppose 
at there Is a real linear combination of A and B that is positive definite 


C H a 


Proof: Suppose that P=aA+8B is positive definite for some a, BER. 


of positive definite matrices, however. Our discussion will illustrate some 


, . . . At least 
useful techniques for dealing with products and sums of matrices. one of a and 6 must be nonzero, so we may assume that 3 +0. 


But since B= B~'(P—aA), if we can show that A and P are simulta- 
neously diagonalizable by *congruence, then it will follow that A and B 
are also. By (7.2.7) we know that P is *congruent to the identity, so there 
$ some nonsingular Ce M, such that CPC =I. Since C*AC is 
Hermitian, there exists a unitary matrix U such that U “C*AC,U =p is 
lagonal. Letting C=C,U, we have C*PC = *AC= o 
C*BC=67'(I—aD) is diagonal. O C5 and C*ACED 50 that 


7.6.2 Definition. Recall that two matrices A, Be M, are *congruent if 
there exists a nonsingular matrix C € M, such that B = C*AC. 

Note that, like similarity, *congruence is an equivalence relation. Some- 
times the term conjunctive is used in the complex case to distinguish it 


from real congruence. 


466 Positive definite matrices 


The most common application of this result is to the classical situation 
in mechanics, in which two real symmetric quadratic forms are given, 
one of which is positive definite. 


7.6.5 Corollary. If A e M, is positive definite and B e M, is Hermitian, 


then there exists a nonsingular matrix C e M, such that C*BC is diagonal 


and C*AC =I. 


Exercise. Find a single change of variables such that both quadratic forms : 


5x?—2xy +y? and x*+2xy— y? are weighted sums of squares. 


There is an analogous result for a pair of matrices, one of which is posi 
tive definite and the other (complex) symmetric. This result is also gen 
eralized in (4.5.15). 


7.6.6 Theorem. If 4€M,, is positive definite and Be M, is a sym 
metric complex matrix, then there is a nonsingular matrix C such tha 
C*AC and C'BC are both diagonal. 


Proof: Choose a nonsingular matrix Cı e M such that C AC, =/. Then 
C} BC; is symmetric, so by Takagi’s factorization (4.4.4) there is a uni 
tary matrix U such that U7(C/ BC,)U=D, where D is diagonal. Then 
U*C{AC,U=I, too, so we may take C= CU. O 


This result has applications to complex function theory; the Grunsky 
inequalities for univalent functions are inequalities between quadrati 
forms generated by a positive definite Hermitian matrix and a complex 
symmetric matrix. 

The following result is an immediate application of (7.6.5). 


7.6.7 Theorem. The function /(A)=log det A is a strictly concave 
function on the convex set of positive definite Hermitian matrices in M, 


Proof: For any two given positive definite matrices A, Be M, we mus 
show that 
f(aA+U—a)B) 2 af(A)+U—a) f(B) (7.6.8 


for all we (0,1), with equality if and only if A =B. Use (7.6.5) to 
write A = C/C* and B= CAC* for some nonsingular Ce M, and A 
diag(\y,.--,\,) with all A; >0. Then 


7.6 Congruence 467 
f(aA+U1—a)B)=f(Clal+(1—-a)A]C*) = f(CC*) + flal+(1-a)A) 
= f(A) +flol +(1-a)A) 
and 
af(A)+(1—a) f(B) =af(A)+(U—a) f(CAC*) 
=af(A)+(1—a)[f(CC*) + f(A)] 
=af(A)+(1—a) f(A) + (l-a@) f(A) 
= f(A)+(1—a@) f(A) 
Thus, it suffices to show that f(aJ+(1—a)A)=(l—a) f(A) for all we 
(0,1) for any diagonal matrix A with positive diagonal entries. But this 
follows easily from the strict concavity of the logarithm function itself 
since 
J(al+(1—a)A) =log [J [a+ (—a)d\;] = X loglat+ (U—a)d;] 
j=l i=] 


i 


= J, [alog1+(1—a) log \;] 
i=] 


=(1—a) y log hi= (1a) log TT; 


i=l izl 


= (1 — a) log det A = (1— a) f(A) 


Equality holds in this inequality if and only if every \;=1, which can 
happen if and only if A=J and B = CIC*= A. O 


Theorem (7.6.7) is often used in the following form, which is obtained 


by exponentiating the inequality (7.6.8). It gives a quantitative expres- 
sion for the fact that a convex combination of positive definite matrices is 
positive definite, and hence must be nonsingular. | 


7.6.9 Corollary. Let A, Be M, be positive definite and let 0<a <1. 
Then 


det[wA + (1—«)B] = [det A]"[det B]'~° 


with equality if and only if A=B. 


Problems 


1. Suppose Ae M, satisfies A*=S~'AS with Se M, positive definite. 
Show that A is diagonalizable and all the eigenvalues of A are real. Hint; 
Consider AS = SA*. Show that AS is Hermitian and use (7.6.3). 


468 Positive definite matrices 


2. Show that f(A) =trA7'isa strictly convex function on the positive 
definite matrices. Hint: The proof of (7.6.7). 


3. How does (7.6.3) generalize if A e M, is positive semidefinite? Show 


that the eigenvalues of AB are still real and that AB has no more positiv 
eigenvalues and no more negative eigenvalues than B has, but that it ma 
have more zero eigenvalues. 


4. To what extent does (7.6.3) generalize if Be M, is not Hermitian? 


5. Show by example that Hermitian matrices might be simultaneously 
diagonalized by congruence without the hypothesis of (7.6.4) being 
satisfied. 


6. Let A, Be M, be given Hermitian matrices. What are all the possi- 
bilities for the signs of real parts of the eigenvalues of AB in terms of 
those of A and B? Can you generalize this to Mp? 


7. Let A,BeM, be Hermitian with A positive definite. Use (7.6.5) to 
show that A+B is positive definite if and only if every eigenvalue of 
A™'B is greater than —1. Hint: A+B=A(I[+A7'B), 


8. Let He M,, be Hermitian, and write H=A+iB with A, Be M,R). 
Verify that A is symmetric and B is skew-symmetric, so the eigenvalues 
of B are pure imaginary and occur in conjugate pairs. Show that H is 
positive definite if and only if A is positive definite and every eigenvalue 
of iA~'B is greater than —1. Hint: Use the fact that x*Hx = x*Ax for all 
xe R”. Use Problem 7. If A is positive definite, show that if \ is an eigen- 
value of iA~'B, then so is —. Conclude that H is positive definite if and 
only if A is positive definite and every eigenvalue of iA~'B lies in the 
interval (—1,1), and the eigenvalues of iA~'B occur in pairs {—), A}. 
Conclude that 0 < det ¿A'B <1 and hence that det B < det A, an inequal- 
ity of H. P. Robertson. Now write H =A +iB=A(I+iA~'B) and show 
that if H is positive definite, then det H = det A det(1+iA~'B) and 
0<det([+iA~'B) <1. Conclude that if H is positive definite, then 
det H < det A, an inequality of O. Taussky. See (7.8.7) and Problem 7 of 
Section (7.8) for a version of these inequalities that is valid for general 
complex matrices He M,,. 


9. In Theorem (4.1.7) we found that a matrix A eM, is the product of 


two Hermitian matrices if and only if A is similar to a real matrix. Use 


(7.6.3) to show that Ae M, is the product of two positive definite Her- 
mitian matrices if and only if A is diagonalizable and has positive eigen 
values. Hint: For the converse, consider A = SAST! = SSSH *AS T]. 


7.7 The positive semidefinite ordering 469 


10. If A, Be M, are positive definite, then we know that the product AB 
is positive definite if and only if AB is Hermitian. Show that the same is 
true for products of three positive definite matrices; that is, if A,B,Ce 
M, are positive definite, then the product S = ABC is positive definite if 
and only if it is Hermitian. Hint: Write S=(AB)C = EC, where E has n 
positive eigenvalues by Problem 9. Use (7.6.3) to show that E = SC `! has 
the same number of positive eigenvalues as S if S is Hermitian. 


11. Provide the details for the following alternate proof of the result 
in Problem 10: Let S(a)= [(1—-a)C+aA]BC for 0<al. If S= 
S(1) is Hermitian, then all S(a) are Hermitian because S(0) = CBC is 
automatically Hermitian. Argue that all S(a) are nonsingular because 
(l-a)C+a@A is nonsingular. The eigenvalues of S(a) depend continu- 
ously on a, all the eigenvalues are positive for a= 0, and no eigenvalue 
vanishes since all S(œ) are nonsingular. Conclude that all eigenvalues of 
S(1) are positive. 


Further Reading. For further results about products of matrices from 
various positivity classes and references to earlier results about products 
of several positive definite matrices, see C. S. Ballantine and C. R. John- 
son, “Accretive Matrix Products,” Lin. Multilin. Alg. 3 (1975), 169-185. 


47 The positive semidefinite ordering 


Because Hermitian matrices are generalizations of real numbers and pos- 


- itive definite matrices are generalizations of positive real numbers, it is 


natural to ask whether there is a good notion of inequality or (partial) 
order among Hermitian matrices. 


7.7.1 Definition. Let A,BeM, be Hermitian matrices. We write 
A> B if the matrix A —B is positive semidefinite. Similarly, A > B means 
that A—B is positive definite. 


Exercise. Show that the above notion of inequality is consistent with the 
notion of equality of matrices. That is, show that A> B and B> A imply 
that A =B. 


Exercise. Show that the relation > is transitive and reflexive, but that 
it is not a total order; that is, there exist Hermitian matrices A, Be 
M, such that neither A >B nor B>A. Such a relation is called a par- 
tial order. 


470 Positive definite matrices 


A partial order on a real linear space is often defined by identifying 
some special closed convex cone and saying that one element is greate 
than or equal to the other if their difference lies in the special cone. In thi 
case, the set of Hermitian n-by-n matrices is the real linear space and the 
set of positive semidefinite matrices is the closed convex cone. This is 


clearly a generalization of the familiar case in which R itself is the real - 
linear space and the nonnegative real numbers are the closed convex : 


cone: This gives the “usual” (total) order (not just a partial order) on R 

Various other notions of “inequality” among matrices, most notably 
component-wise domination for real matrices, can be defined in a similar 
manner: Identify a cone of matrices generalizing the nonnegative rea 


numbers and say that A is “greater than or equal to” B if their difference ` 
A-—B lies in the cone. Generally, different such notions of “inequality” . 
can be distinguished by the context, but their utility hinges upon how far : 
the analogy with real numbers extends and how strongly the notion : 


relates to other inequalities (such as those among eigenvalues, determi 
nants, etc.). 
Notice that A is positive semidefinite if and only if A > 0 and A is pos 


itive definite if and only if A >0, where 0 is the zero matrix of the same | 


size as A. 


Exercise. Show by example that the positive semidefinite partial ordering - 
differs from the total ordering of real numbers in the following way: If 


AzB and if A is not equal to B, it does not follow that A >B. 


We next illustrate some of the properties of the positive semidefinite | 
ordering, each of which may be thought of as generalizing the usual - 


ordering of the reals. The analogy is generally quite strong. 


7.7.2 Observation. If A, Be M, are Hermitian, then 

AzB implies T*AT>T*BT 
for all Te M,, m3 if ms n, we also have 

A>B implies T*AT > T*BT 
whenever Te M,,m has rank m. 
Proof: If A—B is positive semidefinite, then y*(A—B)y=0 for all 
ye C”. Thus, x*(T*AT—T*BT)x = (Tx)*(A—B)(Tx) 20 for all xe C” 
which, in turn, means that T*AT—T*BT is positive semidefinite and 


therefore that T*AT > T*BT. Note that this generalizes (7.1.6) and the 
proof is essentially the same. 


7.7 The positive semidefinite ordering 471 


Exercise. Verify the second statement above to complete the proof. [d 


7.7.3 Theorem. Let A, Be M, be Hermitian matrices, and suppose A 
is positive definite and B is positive semidefinite. Then A > B if and only 
if p(BA~') <1, and A >B if and only if p(BA~') <1. * 


Proof: By (7.6.5) we can find a nonsingular C e M, such that A = CIC* 
and B = CDC*, where D = diag(d), dz, ...,d,) is diagonal. Then A > B if 
and only if C[/-D]C*=0, which is the case if and only if d; <1 for all 
i=1,2,.... But since BAT! = CDC*C*~'C7'=CDC™", the eigenvalues 
of BA”! are precisely di, d>,..., dn [which are all nonnegative by (7.6.3)], 
and all d; <1 if and only if p(BA7~') <1. The last assertion follows from a 
careful examination of the inequalities just used. C 


7.7.4 Corollary. If A, B eM, are positive definite, then 


(a) A>B if and only if B7! > A`]; 

(b) If A>B, then det A = det B and tr A=tr B; and 

(c) More generally, if A > B, then \,(A) = M(B) for all k =1, 2,...,7 
if the respective eigenvalues of A and B are arranged in the same 
(increasing or decreasing) order. 


Proof: We know that A >B if and only if p(BA~') <1. But p(BA~')= 
o(A~'B), and (7.7.3) says that p(A7'B) <1 if and only if B7'> A“. If 
A>B, then p(BA~') <1, and since all the eigenvalues of BA™! are non- 
negative by (7.6.3), we know they must lie in the interval (0, 1]. But then 
their product is at most 1, so det(BA~') <1 and hence det A = det B. In the 
proof of (7.7.3) we found that A = CC* and B=CDC* with C=[c]é€ 
M,, D=diag(d\, dz, ...,d,)€M,, and 0<d; <1 for all /=1,2,...,”. One 
computes easily that 


ii 
trA=trCCt= $ |e? 


j=l 
and 
n 
tr B=tr CDC*=tr DC*C= § dicey? 
i j=l 
n 
< È |cj|?=trA 
i,j=l 
The last assertion (which implies the determinant and trace inequalities, 
for which we have given independent proofs) follows immediately from 


472 Positive definite matrices 7.7 The positive semidefinite ordering 473 


Exercise. If |; Z| >0, show that det C>det BtA~'B and det A> 
det BC~'B*. What does this say when Be M,,,;? Show that det A det C= 
|det B|? if B is square. 

Exercise. Suppose that Ae My, Ce Mm, and Be Mn, m, and suppose that 
both A and C are positive definite. Show that [ i e] > 0 if and only if 


o(B*A BC) <1. 


the Courant-Fischer variational characterization of the ordered eigen- 
values of a Hermitian matrix and is covered by Corollary (4.3.3). 0- 


Exercise. If A >B >0, show that det A>det B and tr A >tr B. 


The form for the inverse of a partitioned matrix (0.7.3), when spe- ` 
cialized to the case of Hermitian matrices, yields the following useful - 
formula: | 

A BY | (A—BC™'B*)"! A`'B(B*A'B-C)`' 
BX C| | (B*A'B-C)'B*A™! (C—B*A~'B)7! 


The partitioned positive definite matrix in (7.7.6) is related to some 
bilinear inequalities arising in complex function theory and harmonic 
analysis that share some of the properties of the positive definite partial 
order. 


| (7.7.5). 


In this formula we assume that A and C are square and that the appro- - 
priate matrices are nonsingular. 

. A B1. os . AB YT, . 

If the matrix [ Re | is positive definite, then [ p e] exists and is pos- ; 


itive definite. It then follows from (7.7.5) and (7.1.2) that (A — BC™'B*)~ 
and A~—BC~'B* are positive definite. Similarly, C— B*A7'B, A, and C | 
are positive definite. Thus, if the partitioned Hermitian matrix [ a e] is 


positive definite, we have 
A>0, C>0, A>BC™'B*, and C>B*tA~'B 


7.7.7 Theorem. Let Ae M, and Ce Mp be positive definite, and let 
Be Mn, m. The following are equivalent: 

(a) (x*Ax)(y*Cy) = |x*By|? for all xe C” and all ye C” 

(b) x*Ax+y*Cy 22|x*By| for all xe C” and all y e C” 

(c) p(B*A7'BC™') <1 

a) [p ]>0 


pm 


Proof: We shall show that (a) implies (b) implies (c) implies (a); we 
already know that (c) and (d) are equivalent. If (a) holds, then by the 
arithmetic-geometric mean inequality we have 


i (x*Ax+y*Cy) > (x*Ax)'?(y*Cy)!? > |x*By| 
so (b) follows. If we assume (b), then 


x*Ax+y*Cy = (Al?x)*(Al?x) 4(C' yy "(Cl y) = 2|x*By]| 


7.7.6 Theorem. Suppose that a Hermitian matrix is partitioned as 


A B 

B* C 
where A and C are square. This matrix is positive definite if and only if 
A is positive definite and C > B*A7'B. Furthermore, this condition is 


equivalent to having p(B*A~'BC~') <1. 
and hence for every xe C” and every ye C” we have 


Proof: The necessity of the two conditions has been noted above. For y anI? 
x*x+y*y = 2|(47x)*B(C?y)|=2|x*A 7" BC y| 


sufficiency, suppose that A is positive definite and C>B*A~'B, and 
calculate, for X = —A7~'B, 


I OJA BJ]I[I X _[A 0 
X* IT\{BX C}}O I| |0 C—BtA'B 
Since the right-hand side is positive definite, the positive definiteness of 
A B 
B* C 
follows from the exhibited congruence and (7.1.6) or (7.7.2). The last 
assertion follows from (7.7.3) applied to the inequality C > B*A TB. 0 


If we set x= A7"? BCT "?y in this inequality, we obtain 


yt? BA TBC yt y*y 22| yc! ?B*AT'BC y| 


Since the matrix C72 B*AT! BCT! is positive semidefinite, this is 


equivalent to 

yry>ytC BA TBC y forall yeC” 
If y is chosen to be an eigenvector of C7 '’*B*A7'BC~'”?, this inequality 
says that the (necessarily nonnegative) associated eigenvalue is bounded 
by 1, so we conclude that the spectral radius is at most 1; that is, 


474 Positive definite matrices 


1> p(C7?B*AT!BC T?) = p(B*A7'BC™') and (c) follows. Finally, if 
(c) holds, then for any xe C” and any ye C” we have 


Ix*(4 7 BCT Py) = Ixl] BC y 
= (x*x)(y*C T B*A TIBC"? y) < (x*x)(y*y) 


where |x]. =(x*x)'” is the Euclidean norm. If we now make the substi- 
tution x > A"?x and y > Cy, we obtain 


Ix*By|? <(x*Ax)(y*Cy) forall xeC” andall yeC” [Q 


For another sort of inequality that stems from (7.7.5), we consider 
two possible operations that might be applied to a positive definite 
matrix: inversion and extraction of a principal submatrix based upon a 
prescribed set of indices. We know that both operations preserve positive 
definiteness, but is there any relation between the result of applying these 
operations in the two possible orders? The intriguing fact is that these 
two operations “commute except for an inequality.” 


7.7.8 Theorem. Suppose that P eM, is positive definite, and let SC 
{1,2,...,} be an index set. Then 


P(S) > [P(S)]7! 


where the left-hand side of this inequality is the principal submatrix of 
P`! determined by deletion of the rows and columns indicated by S, 
while the right-hand side is the inverse of the corresponding submatrix 
of P. 


Proof: Since the set of positive definite matrices is closed under permuta- 
tion congruence, we may assume that 


p= A B 
|B C 
and that P(S) = A. Then, P~'(S) = (4—BCT'B*) ~! and [P(S)] =A". 
Since C>0 (because P>0), we have BC7'B*>0 and 
A> A-—BC™'B*>0 
The asserted inequality then follows from (7.7.4a). O 


Theorem (7.7.8) may be paraphrased “The inverse of a principal sub- 
matrix is less than or equal to the corresponding principal submatrix of - 


the inverse” for a positive definite matrix. 
One application of (7.7.8) is to a particular choice of a principal sub 
matrix that produces the Hadamard product from the Kronecker product 


7.7 The positive semigefinite ordering 475 


(see [HJ]). If A, Be M,, and if S={1,n+2,2n+3,3n+4,..., n°}, then 
AcB=(A@®B)(S). If A and B are invertible, then A® B is invertible and 
(A@B)!=A7'@B!. Thus, if A and B are positive definite, and if 
(7.7.8) is applied to P= A®B, we obtain 


A7'eB7! =(A~'@B7')(S) = (A@B)|(S) = ((A@B)(S)] 7! = (AeB)"! 


If we take B= A, this says that A~'eA~!> (AeA) 7', But if we take B= 
A7!, this says that AT! A > (AeA7')7!=(A7!0A) 7! whenever A is posi- 
tive definite. 

The last inequality says that A`- A dominates its own inverse. What 
does this say about A~'eA? If C is a positive definite matrix with C= 
UAU*, A=diag(\y,-.-,\n), and all \; >0, then C > C7! if and only if all 
\;21, and hence C >I > C~!, We may summarize these observations as 


7.7.9 Theorem. Let A, Be M, be positive definite. Then 

(a) AuloBo!>(AcB)7!; 

(b) ATAT! > (42A); and 

(c) ATLASI > (ATA). 

Since AT!A =I, the first part of (c) says that A~'eA > AT'A; that is, 
Hadamard multiplication dominates ordinary multiplication in this case. 


Problems 


1. In general, if A, Be M, are Hermitian and A > B, show that if N < 
ho + Sd, are the ordered eigenvalues of A and if pm S p25 `t S Hn 
are the ordered eigenvalues of B, then N; 2p; (=1,2,...,n. Show by 
example, however, that the converse is not always valid. 


2. If Ai, A2, Bi, By€M, are all Hermitian, show that if A> B; and 
A, > Bz, then A; +A > Bı +B. 

3. Let A,B,C eM, be Hermitian; assume that A >B and C20. Show 
that AeC > BoC, 

4. Let A,B,C, DeM, be Hermitian and suppose that A> B20 and 
C>D+0. Use the previous problem to show that A-C > BeD=0. 


5. If A,BeM, are Hermitian matrices such that A> B, and if JC 
{1,2,...,} is any index set, then show that A(J) > B(J). 


6. Show that (7.7.6) generalizes (7.2.5) for n=2. 


7. What does (7.7.6) say if Ce M,? How does one augment a posi- 
tive definite matrix with one row and column and preserve positive 
definiteness? 


476 Positive definite matrices sas eg , , 
7.8 Inequalities for positive definite matrices 477 


8. Show that the inequality of (7.7.8) is strict if and only if P(S, S^) has 
full row rank and equality holds precisely when P(S, S’) = 0. Here, P(S, S^ 
is the submatrix of P resulting from deletion of the rows indicated by S and ` 
the columns indicated by S’. Hint: Show that rank[P~'(S) — P(S)7'] 
rank P(S, S’). 


9. If AEM, is Hermitian, show that J =A if and only if all the eigen- 
values of A are less than or equal to 1. 


The fundamental determinant inequality for positive definite matrices 
is Hadamard’s inequality. Many other inequalities are generalizations in 
one way or another of this one result. 


7.8.1 Theorem (Hadamard’s inequality). If A= la;;] € M, is positive 
semidefinite, then 


n 
det A< II Qj; 
iz] 


10. Use (7.7.7) to give an alternate solution to Problem 11 of Sectio 


B oI . 
(7.4). Hint: Show that F Bai] =Oif B>0. Furthermore, when A is positive definite, then equality holds if and only 


f A is diagonal. 


11. Consider (7.7.7) when A = C. Show that the following are equivalen 
(a) (x*Ax)(y*Ay) = |x*By|* for all x, ye C” 
(b) x*Ax+y*tAy= 4 |x*By|* for all x, ye C” 
(c) p(B*A7'BA™)<1 
(d) x*Ax2|x*Bx| for all xe C” 


_ Proof: If A is singular, there is nothing to prove, so assume A is nonsin- 
- gular, in which case all a; 0. Define d; =aj;'/? and let D=diag(d), dy, 
asda). Since det DAD <1 if and only if det A Saua apn, it suffices 
_ to assume that each diagonal entry of A is equal to 1. If hj, ---, A, are the 
: (necessarily positive) eigenvalues of A, we have 


A 1 n n 1 n 
det A= [J nei 5 x) =(; A) = | 


12. Show that if Ae M, is invertible and symmetric, then all the ro 
sums of A~'eA are equal to 1. Hint: Look at the cofactor representatio 
for the elements of A~!. Thus, if A is real and positive definite, show tha 
it is not possible to have A'o A >T even though ATA >I. i i=l 

: The inequality follows from the arithmetic-geometric mean inequality 
_ for nonnegative real numbers. Equality holds in the arithmetic-geometric 
mean inequality if and only if all \;=1, but since A is Hermitian and 
hence diagonalizable, this can occur if and only if A =Z. Thus, equality 
holds in the original inequality when A is positive definite if and only if A 


is diagonal. O 


Another determinantal inequality for general square matrices is equiv- 
alent to (7.8.1), and it also is referred to as Hadamard’s inequality. Geo- 
metrically, |det A| is the volume of the n-dimensional parallelepiped whose 
generating edges are given by the rows (or columns) of A. This volume is 
largest when the generating edges are orthogonal, and in this case the vol- 
ume is the product of the lengths of the edges. Hadamard’s inequality is 
an algebraic statement of this geometric inequality. 


13. If A“ denotes the Hadamard kth power of A, and if A € M, is pos 
itive definite, show that (A7') >(A“)~! for all k=1,2,.... 


Further Reading. For the background to (7.7.7) and further reference 
see C. FitzGerald and R. Horn, “On the Structure of Hermitian-Sym 
metric Inequalities,” J. London Math. Soc. 15(2) (1977), 419-430. Se 
also C. Johnson, “Partitioned and Hadamard Product Matrix Inequal 
ities,” J. Research NBS 83 (1978), 585-591, for further references related 
to (7.7.8), (7.7.9). 


7.8 Inequalities for positive definite matrices 


We next discuss inequalities involving quantities associated with one or 
more positive definite matrices. These are to be distinguished from the 
matrix inequalities introduced in the preceding section, although exam- 
ples of the former are often associated with instances of the latter. For 
example, A > B>0O implies det A = det B. The positive definite matrices 
are rich in inequalities involving determinants, eigenvalues, and other 
quantities. In this section, we examine some inequalities that do not 
necessarily stem from matrix inequalities. 


7.8.2 Corollary (Hadamard’s inequality). For any matrix B= [ble 
Mns 
” nyn We nyn 1/2 
Idet B| < I ( S |b, ) and |det B)< TI ( 5 but) 
i=] \ j=l j=1\i=1 
Furthermore, when B is nonsingular, then equality holds if and only if 
the rows (respectively, columns) of B are orthogonal. 


478 Positive definite matrices 7.8 Inequalities for positive definite matrices 479 


Proof: If B is singular, there is nothing to prove. If B is nonsingular, - Exercise. Deduce Hadamard’s inequality (7.8.1) from Fischer’s inequality. 


apply (7.8.1) to the positive definite matrix A =BB*, and take square ` 
roots. The right-hand side of the first inequality is the square root of the - 
product of the diagonal entries of A, and the left-hand side is the square , 
root of det A. The rows of B are orthogonal exactly when A is diagonal, ` 
which is the case of equality in (7.8.1). The second inequality follows = 


from applying the first to B*. © 


Exercise. We have deduced (7.8.2) from (7.8.1). Now show that (7.8.1 
follows from (7.8.2). Hint: If A is positive definite, there exists a uniqu 
positive definite B such that B? = A. Apply (7 .8.2) to B and square. 


Exercise. Use Hadamard’s inequalities (and variants thereof ) to give th 
best bound you can for 


1 1 1 
det} 1 —1 -1 
1-1 1 


Two generalizations that refine Hadamard’s inequality for positive 


definite matrices are attributed to Fischer and to Szasz. In Fischer’ 
inequality, complementary principal submatrices play the role that diag 
onal entries play in Hadamard’s inequality. 


7.8.3 Theorem (Fischer’s inequality). Suppose that 


P=| 5 4 
B* C 


is a positive definite matrix that is partitioned so that A and C are square 


and nonempty. Then 


det P s (det A) (det C) 


Proof: Let X = —A7'B, and compute 


det P=det| y oj A Bill X =det| 4 0 
X* THB CilO I 0 C—B*tA'B 


= (det A)(det[C — B*A~'B]) < (det A) (det C) 


The last inequality utilizes (7.7.6) and (7.7.4b) to ensure that det C2 


det(C— B*A~'B) because C> C— B*A™'B>0. 


Also, formulate and state Fischer’s inequality for partitions of P finer 
than that in (7.8.3) (two principal submatrices) but not so fine as in 


(7.8.1) (n principal submatrices). Note that in this case the right-hand 
side of Fischer’s inequality is less than or equal to the right-hand side of 
Hadamard’s inequality. Thus, Fischer’s inequality for various partitions 
of increasing refinement gives a monotone nondecreasing sequence of 
upper bounds on det P. 5 


There is another inequality that gives a sequence of upper bounds on 
the determinant and includes Hadamard’s bound. Let P;(A) denote the 
product of all the k-by-k principal minors of A [there are (2) of them]. 
Notice that P, (4) = det A and P (A) = @)1@22 +++ apn. 


7.8.4 Theorem (Szasz’s inequality). If A e M, is positive definite, then 


n~-1\7l n~-1\7l 
Ppa (Ae) <P,(A)Ht) forall k=1,2,...,0-1 


Proof: Since the diagonal entries of A”! are just ratios of (n—1)-by- 
(n—1) principal minors of A to det A, direct application of (7.8.1) to the 
positive definite matrix A`! yields 


_) . Pa-WA) 
= det A be E 
deta A$ Taea" 


and hence 

P(A)" = (det A)" T! < P,_1(A) 
Extraction of (n— 1)st roots of both sides of this inequality produces the 
case k =n —1 of Szasz’s family of inequalities. The remaining cases may 
be derived inductively. For example, for the case k = n—2, one identifies 
each (n—1)-by-(n—1) principal submatrix as one initial matrix and 
applies the above inequality to get 


P,,\(A)"~? < Py—2(A)? 


since each (n—2)-by-(n—2) principal submatrix of A occurs twice as a 
principal submatrix of some (”—1)-by-(m—1) principal submatrix of A. 
Extraction of the (n—1)(n—2)st root of both sides yields the case 
k=n—2, and the remaining cases follow in the same way. O 


Exercise. Show that Szasz’s inequality implies Hadamard’s inequality 
(7.8.1). What is the case of equality? 


480 Positive definite matrices 7.8 Inequalities for positive definite matrices 481 


where the notation is the same as in (7.8.5). Notice that Ape Bi = (A. B)ij. 


7.8.5 Observation. Let A € M, be positive semidefinite and define ; . , , . 
"SP Since A — œE is positive semidefinite, we know that (A — wE}1)°B is pos- 


det A `. . . 
A) Scan if Ay, is positive definite itive semidefinite, and hence 
a = 
" . 0 <det(A—ak\;)°B = (det A-B) — aby (det Ae Bi) 
0 otherwise 


where Aj, is the (”—1)-by-(n—1) principal submatrix of A that results: From this it follows that 


from deleting the first row and column of A. Let E € M, be the matrix 
whose 1,1 entry is 1 and all of whose remaining entries are 0. Then 
A—tEn is positive semidefinite for all {<a(A) and is not positive 
semidefinite for any f>a(A); in particular, A—a(A)E;; is positive. 
semidefinite. 


det A» B 2 aby det Aye Bii = ab; (det Aj1) II b = (det A) II bij E 


f= t=] 


Exercise. If A, B e M, are positive definite, show that 
(det A) (det B) = det AoB 


Proof: It suffices to consider the case in which A is positive definite. Use : and show further that 
(7.2.5) applied to the “trailing” principal minors. Notice that the fir , 
n—l trailing minors of A—tE, are the same as those of A, and that- 


n n n 
(det A) (det B) < (det A) II bjs det A-B < [J a; T] b; 
det(A — tEn) =det A —t det An. O =] 


i i=] p=] 
- Exercise. If A € M, is positive definite, show that det AeA™! >], 


Exercise. Provide details for the proof of (7 .8.5). 

Exercise. Prove Hadamard’s inequality (7.8.1) by induction using (7.8.5) _ A determinantal inequality of a rather different sort applies to non- 
= Hermitian matrices A for which H(A) is positive definite. It may be 
_ thought of as a generalization of the inequality |z| > |Re z| for complex 


` numbers. 


Hadamard’s inequality (7.8.1) may also be stated in terms of Hada 
mard products as 


n 
(det A) JJ 1< det Aol 
i=l 
The following is an inequality of Oppenheim (strengthened by Schur 
that generalizes Hadamard’s inequality by showing that there is nothin 
special about the role of the identity matrix in the above inequality. 


7.8.7 Theorem (Ostrowski-Taussky). If Ae M, is such that H(A)= 
` (A+A*)/2 is positive definite, then 


det H(A) s |det A| 


Equality holds if and only if A is Hermitian. 


7.8.6 Theorem (Oppenheim’s inequality). If A,BeM, are positive 
semi-definite, then Proof: Let S(A) =(A—A*)/2, so that A= H(A)+S(A). The asserted 
n inequality is then the statement that 

(det A) Il ba < det A-B 


j=] 


|det[+H(A)7'S(A)]/=1 
But H(A)~'S(A) is similar to the skew-Hermitian matrix 
H(A)~"/?S(A) H(A) 7! 


and hence it has only pure imaginary eigenvalues. Thus, it suffices to note 
that |1+7t| = 1 for any real number ¢. If it), it2,..., it, are the eigenvalues 
of H(A)7'S(A), then 


Proof: We proceed by induction on n. The assertion is immediate for 
n=1. If n= 2 and if it holds for all matrices of size at most n—1, then it 
follows from the induction hypothesis that 


Rh 
(det Ay,) [I b; <det Ane Bi 
j=2 


482 Positive definite matrices 


n 
Idet [7+ H(A) 'S(A)]| = T] |1 +it;|=1 
j=l 
Also, equality holds if and only if all ¢;=0, which is equivalent to 
S(A) =0 since a skew-Hermitian matrix is diagonalizable. 0 


An important determinantal inequality involving the sum of two posi- 
tive definite matrices is due to Minkowski. Its proof is similar to that of 
the preceding result. 


7.8.8 Theorem (Minkowski’s inequality). If A,BeM, are positive 
definite, then 


[det (A +B)]/" = (det A)" + (det B)” 
Proof: Observing that both sides of the asserted inequality are ho- 
mogeneous of the same degree, we multiply on the right and left by 


(det A7! 2yI/n_ Thus, we may assume without loss of generality that 
A=], and we must prove that 


[det(/+B)]/"=1+ (det B)” 


If 0<)d,;<-:: SX, are the eigenvalues of B, the desired inequality is then - 


equivalent to 


n 
TT + );)= A+A hy)” 
i=] 


which may be verified directly by explicit multiplication of both sides — 
and term-by-term comparison using the arithmetic-geometric mean | 


inequality. C 


Exercise. Provide details for the proof of (7.8.8) and show that equality — 


holds if and only if B=cA for some c20. 


Problems 


1. Inequality (7.8.2) says that the magnitude of the determinant is | 
dominated by the product of the /, norms of the rows. Compare this ` 
with the result [see Problem 3 in Section (6.1)] that the magnitude of 
the determinant is dominated by the product of the /; norms of the | 
rows. What does each bound say geometrically? Are there other such | 


bounds? Try lo. 


7.8 Inequalities for positive definite matrices 483 


2. The left-hand side of (7.8.2) is invariant under left unitary multiplica- 
tion of B, and the left-hand side of (7.8.1) is invariant under unitary sim- 
ilarities of A, but the right-hand sides are not so invariant. When are the 
right-hand sides minimized? When are they maximized? Can a better 
bound be obtained in this way? 


3. Use Fischer’s inequality to verify the following block generalization 
of Hadamard’s inequality (7.8.2): Let A=[Aj;;] be an nk-by-nk complex 
matrix, partitioned so that each block A;; e Mg. Then 


n n 1/24 k 
deral=| TE (418) | 
jm {= 
May matrix norms other than the spectral norm be used here? 
4. Determine the cases of equality in (7.8.6). 
5. Let A, Be M, be positive definite and show that 
n n 
det A-B + (det A) (det B) = (det A) [J b; + (det B) TJ a; 
i=l i 


i=] 


_ Show that this strengthens (7.8.6). Hint: Show that 


a(AcB) = a(A)by +a(B)ay — a(A)a(B) 
and apply this to the natural induction hypothesis. 


6. Show that the inequality in the previous problem can be extended 
further to yield 


det Aye Bii 


det Ao B + (det A) (det B) ———————————— 
(det A) (de ) (det A11) (det By) 


n n 
> (det A) J] b; + (det B) [Į a; + (det A) bı (det an( Ann -1) 
i=l i=] 


det Aj; 
+ aera, (det Aj) Era m 1) 
H 


7. If AeMn has positive definite Hermitian part H = (A+ A*)/2 and if 
n>1, show that the inequality in (7.8.7) can be strengthened to 


det H(A) + |det S(A)| < |det A| 
What is the case of equality? Hint: You must show that 
|det(-+H~'(A)S(A)]| =14 |det H(A) ~'S(A)| 


which is equivalent to having 


484 Positive definite matrices 
Ai n 
[J |1+it;| 21+ IT |! 
j=l j=l 
Show that 


n fi n 
[J iti? =1+ Ste + 7 
j=l j=l j=] 


n n 2 n \2 
z1l+n I] u+( I] n) >(1+ I] til) 

j=l j=l j=l 
Can you strengthen the inequality further? Note: A natural inequality 
for complex numbers of which the stated inequality may be thought of as 
a generalization would be |z| = |Rez|+|Imz|; show that this inequality 
is false (hence the assumption that 7 > 1) and conclude that the determi- 
nantal inequality is therefore somewhat surprising. 


8. If A, Be M,, are positive definite, show that det (A + B) 2 det A + det B. 


9. Use Minkowski’s inequality to prove Fischer’s inequality. Hint: 


Apply Minkowski’s inequality to the two positive definite matrices 


ee]. [6 rdle ello =| “cl 


10. A positive definite matrix P e M, can be factored as P= LL", where 


L is lower triangular with positive diagonal entries (7.2.9). Use this fact. 


to prove Fischer’s inequality. 


11. Let A, Be M, be positive semidefinite. If A and B are nonsingular, 


show that AeB is nonsingular (and positive definite). If AeB is singular, 
show that at least one of A,B is singular. How does this relate to the 
inequality rank 4° B < (rank A) (rank B) from Section (7.5)? 


12. Show that if A=[a,;]e@M; is a matrix with real entries and if all 
la;;| <1, then |det A| < 3V3. Also show that this bound is never attained. 


Hint: 
ð i+] a 
~~ (det A) = (—1) A;, and ggz (det 4) = 0 


ai; ij 
where A, is the determinant of A with row i and column j deleted. If 
A,, =0, then det A is independent of the value of a;,, which may there- 
fore be taken to be +1. If A,, #0, then det A will not have an extremum 
with respect to a,j if 0 < a; <1. Thus, |det A| achieves its maximum value 
within the given constraints when all a;;= +1. There are only finitely 
many such matrices for n= 3. What is the result for general n> 3? If A 
has complex entries, then use the maximum principle (the maximum 


7.8 Inequalities for positive definite matrices 485 


modulus theorem) for analytic functions to show that |det A| cannot 
have a maximum in the interior of the set {A e M,,: all alsel] 


. 


13. If A= [a;;] € M, and K = max{la,,|}, use Had d’s j , 
show that |det A| < Kage jauh, amard’s inequality to 


14. Let A e M, be positive definite and let aGN= {1,... 7} be an index 
set. Fischer’s inequality may be stated as det A < det A(a) det A(a’), in 
which a’ is the complement of œ with respect to N. A generalization of 
this result, often referred to as the Hadamard-—Fischer inequalities, is 


det A(aN B) 


which holds for positive definite Hermitian matrices A and all index sets 
at, BS N. By convention, det A(¢)=1. Prove the Hadamard-Fischer 
inequalities using only Fischer’s inequality and the second formula in 
(0.8.4). Hint: Assume without loss of generality that a U 6 = Nand apply 
Fischer’s inequality to A~'(’U"). Then apply (0.8.4) to each minor. 


det A(wUB) s (7.8.9) 


15. Use the fact that a positive definite Hermitian matrix may be written 
as LL*, with L lower triangular and nonsingular (7 .2.9), to give a direct 
proof of the Hadamard-Fischer inequalities (7 .8.9). Hint: Suppose 1< 
J<k<n, and suppose that a={1,...,k} and B=f1,...,7,kK+1,.. ni, 
without loss of generality. Then consider a corresponding block 3-by-3 
partitioning of A and L. 


16. Let Ae M, be positive definite. Use the Hadam i i 
iti ' ard-Fisch - 
ities (7.8.9) to show that er inequal 


n=l 


I] det A(fi,i+1}) 
det A < =! 


a-l 


[I a; 
i=2 
17. Let A e M, be positive definite. Show that 


n 
det A =minf I] vřAv,: {v,,..., 0a} CC” is an orthonormal ser) 


i=l 
Hint: Let V= [v... v ] E€ M, and apply (7.8.1) to A= V*AV. 
18. Let Ae M, be positive definite and let {1,..., uy} C C” be an ortho- 


normal set. Use Problem 17 to show that {u,...,u,} are eigenvectors 
of A and {ufAm,...,u,Au,} are the corresponding eigenvalues of A if 


486 Positive definite matrices 


det A = J] u7Au; 


i=l 


19. If Ae M, is positive definite, show that 


n(det A)!” =min{tr AB: Be M, is positive definite and det B=1} 


Hint: Write A = UAU*, with A=diag(\j,...,,), all A; >0, and unitary : CHAPTER 8 
UeM,, so tr AB=tr A(U*BU). Then use the arithmetic-geometri 


mean inequality and Hadamard’s inequality (7.8.1) to show that 


n n l/”n n l/n 
= Sy \;0;;2 ( I] Mbu) = (daa Il bi) Pad [det Aq!” 
j=l 


i= i=] i=] 


Nonnegative matrices 


with equality possible. 


20. Use the quasi-linearization in Problem 19 to prove Minkowski’ 
inequality (7.8.8). 


21. Let A €M, be positive semidefinite and write A in partitioned form as 
ay x* 
A= l x A l 
Use the reduction formula for the determinant in Problem 15 of Sectio 
(4.1) and Problem 11 in Section (7.2) to show that 
det A = ay det A—x*(adj A)x < ay det A 


8.0 Introduction 


Suppose there are n22 cities C4, ..., C, among which migration takes 
place as follows: Simultaneously at 8:00 A.M. each day a constant frac- 
tion a;; of the current population of city j moves to city į for all i j; the 
fraction a;; of the current population of city j remains in city j. Thus, if 
we denote the population of city i on day m by p!””, we have the recur- 


. . . . . sive relation 
Use induction and this inequality to give another proof of Hadamard’ 10 


inequality (7.8.1). 


(7) 


(m+1) (m) 
i =a) Py Hee tap Pps i=1,...,7, m=0,1,... 


P 


between the population distributions on days m and m +1. If we denote 
the n-by-n matrix of migration coefficients by A = [a; j] and the popula- 
tion distribution vector on day m by p™ =[p!"], then 


Pp D=Ap\™ = AAp "Y= cae = At! p, m=0, 1,... 


Further Reading. For more information on Theorem (7.8.7) see A. M 
Ostrowski and O. Taussky, “On the Variation of the Determinant of 
Positive Definite Matrix,” Proc. Kon. Nederl. Acad, Wetensch. Amster 
dam, Ser. A, 54 (1951), 383-385. A class of inequalities that relates det A 
to other generalized matrix functions (0.3.2) of A when A is positive defi 
nite and that also generalizes Hadamard’s inequality (7.8.1) may b 
found in I. Schur, “Uber endliche Gruppen und Hermitesche Formen, 
Math. Z. 1 (1918), 184-207. 


where p is the initial population distribution. Since the coefficients a;; 
represent population fractions, we have 0O<a;;<1 and Y7.,a;;=1 for 
each j=1,..., 7. . 

In order to make sensible long-range plans for city services and capital 
investment, government officials wish to know how the total population 
p=}, p® will be distributed far into the future; that is, they want to 
know about the asymptotic behavior of p‘” for large m. But since 
p'=A™p, it is apparent that one must look at the asymptotic be- 
havior of A”. 

As an example, let us consider in detail the case n = 2. We have 
Qj) +2; = 1= a2 + 22, so if we denote a2; =a and ap = 8, we have 


487 


488 Nonnegative matrices 


A= Í -a É | 
a 1-6 
and we are interested in A” for large m. If A were diagonalizable, we 
could compute A” explicitly. Thus, we begin by computing the eigen- 
values of A: \)=1 and \;=1—a—8. Since 0<a,B<1, we have M= 
1=|d,| =|1-a—B], so 1=|d2| = e(A) and the spectral radius of A 
is an eigenvalue of A. Moreover, except in the trivial case a=B=0 
(in which case A is reducible), we see that z = p(A) is a simple eigenvalue 


of A. 
If œa +8 #0, the respective eigenvectors are x = [6, al” (for X2 = 1) and 
z=[1, —1]’ (for \;), so in this case A is diagonalizable and A = SAS7', 


where 


p 0 f8 1 a l fl 1 
asla agh s=|" mn and 5 =e | 


Notice that the components of the eigenvector x are nonnegative and are 


positive if A is irreducible. 
If œ and 8 are not both 1, then |A;|=|1—-a—6]| <1 and so di 70 as 
m— oo. Thus, in this case we have 


, , _ 1 0l 1 B B 
m A” 1 l 
lim A s( lim )s sla 5/5 ate f "| 


I+ 0 m- oO 


and so the equilibrium population distribution is 


im p= |f 8 po] 1 [° 
m= a+ laa ps? atB la 


Notice that the equilibrium distribution is entirely independent of the — 


initial distribution. The matrix A” approaches a limit whose columns are 


proportional to the eigenvector x associated with the eigenvalue 1 (which ` 


is the spectral radius of A), and the limiting population distribution is 
proportional to this same eigenvector. 

The two exceptional cases not treated above are easily analyzed individ- 
ually. If a=6=0, then A =/, limp, +c A” = I, and lim p->% pi” =p), 
so the limiting distribution is not independent of the initial distribution. 

Ifvo=6=1, then A= K ol and the two cities exchange their entire 


populations on successive days. The powers of A do not approach a limit 
and neither does the population distribution if the initial population dis- 
tribution is unequal. However, there is a sense in which an “average equi- 
librium” is attained, namely 


8.0 Introduction 489 


Iig 5.5 1 2 + py” 
lim — A= and lim — Pi tP |l 
moo M > p 3 n Jim m a P ~ 2 


In summary, we found in this example that p(A) =1 and: 


1. The spectral radius p(A) =1 is itself an eigenvalue of A, and is 
not just the absolute value of an eigenvalue. 

2. The eigenvector x associated with the eigenvalue p(A) can be 
taken to have nonnegative components, which are positive if A is 
irreducible. 

3. p(A) is a simple eigenvalue of strictly largest modulus if all the 
entries of A are positive. 

4, If all entries of A are positive, then lim,,....[A/p(A)]|” exists 
and is a rank | matrix, all of whose columns are proportional to 
the eigenvector x. 

5. In all cases, lim m .0(1/m) Si", (A/p(A))* exists. 


These conclusions are in fact generally true for n=2, but it is not 
possible to analyze the general case with the simple direct methods em- 
ployed above. For example, A need not be diagonalizable when n=2 
even if all the entries of A are positive. New tools are required and will be 
developed in the rest of this chapter. 


Problems 


1. Show that the matrix A= |; į | has spectral radius 1, but that A” is 
unbounded as m- oœ, 


2. Consider the matrix 


1 1 
I+e I+e 
A, = 
e? 1 ; e>0 
l+e I+e 


(a) Show that \,=1 is a simple eigenvalue of A., that p(A) =)2=1, and 
that 1>|d4|. (b) Show that 


1 fi l+e fe 
x= = 
l+e B and y 2e [i] 


are eigenvectors of A, and Al , respectively, corresponding to the eigen- 
value > =1. (c) Calculate A?” explicitly, m =1, 2, .... (d) Show that 


. 1 e! 
m0 € 1 


490 Nonnegative matrices 8.1 Inequalities and generalities 491 


(e) Calculate xy’ and comment. (f) What happens if ¢ +0? Hint: Set 
B,=(1+e)A, and proceed as in the text to diagonalize B,. 


- Exercise. Now assume that A, B, C, D e M, and that x, ye C”. Show that: 


(8.1.8) |Ax|<|A| |x]. 

(8.1.9) |AB|/<|A] |B). 

(8.1.10) |A”|<|A|” for all m=1,2,.... 

(8.111 IfOsAsBand0<C<D, then0<AC<=BD. 

(8.1.12) IfOSASB, thenO0<A”"<=B" for all m=1,2,.... 

(8.1.13) If A=O, then A” 20; if A >0, then A” >0 for all m=1,2,.... 
(8.1.14) If A>0, x20, and x #0, then Ax >0. 

(8.1.15) If A200, x>0, and Ax =0, then A =0. 

(8.1.17) |AJ2=[ Alf. 

Obviously, the last two assertions hold for any absolute norm, of 


which the Frobenius norm (/, norm) is only one example. The first appli- 
cation of these simple relations is to an inequality for the spectral radius. 


3. Interpret what it means for the general matrix of intercity migration 
coefficients to be irreducible in terms of the freedom of travel of the 
populace. 


4. For the two-city example discussed in this section, show that 
lim m> o(1/m) DZ_; A“ exists in the cases a=6=0 and «+8 #0. What 
is the limit in each case? 


Further Readings. For a wealth of information about properties of posi- 
tive and nonnegative matrices as well as many references to the theo- 
retical and applied literature, see [BPl] and [Sen]. In [Var] there is a 
summary of results about nonnegative matrices, with special emphasis 
on applications to numerical analysis. 


8.1 Nonnegative matrices - inequalities and generalities 8.1.18 Theorem. Let A, Be M,. If |A| £B, then p(A) < p(|A]) < p(B). 


Proof: For every m=1,2,... we have |A”| s |A|” < B” by (8.1.10) and 
(8.1.12). Thus, by (8.1.16) and (8.1.17) we have 


[a"s [|A|”l2 <18" and JA" <A" 1" < |B" |2" 


for all m=1,2,.... If we now let m — œ and apply (5.6.14), we deduce 
that p(A)<p(|A|)s(B). O 


Let B=[b);]eM,,, and A=[a;;]eM,,,. We write 
B20 ifall 5b,;20 
B>0 ifall 5; >0 
A2B if A-B20 


A>B if A-B>0 


The reverse relations < and < are defined similarly. We define |A| 8.1.19 Corollary. Let A, Be M,,. If O< ASB, then p(A) S p(B). 


[|a;;|]. If A20, we say A is a nonnegative matrix, and if A>0, wes 
that A is a positive matrix. The following simple facts follow immediate 
from the definitions. 


8.1.20 Corollary. Let Ae M,. If A=O0 and if A is any principal sub- 
matrix of A, then p(A) < p(A). In particular, max;—),.., dij £ p(A). 
Exercise. Let A, Be M,,,. Show that Proof: Let 1<r<n and let A be an r-by-r principal square submatrix 
of A. Let A denote the n-by-n matrix formed by placing the entries of A 
in their former positions (as entries of A) and placing 0’s elsewhere. 
Then p(A) = p(A) and 0<A<A, so p(A) = p(A) < p(A) by Corollary 
(8.1.19). 0 


The lower bound a; < p(A) in the preceding corollary is the first non- 
trivial lower bound we have obtained on the spectral radius of a not- 
necessarily-Hermitian matrix, but the hypothesis that A is nonnegative is 
essential. 


(8.1.1) |A|20 for every A; |A| =0 if and only if A=0. 

(8.1.2) JaA] =la] |A] for all ae C. 

(8.1.3) A+B|s|A|+|B]. 

(8.1.4) If A=Oand A #0, then it need not be true that A > Oif eith 
n or r is greater than 1. 

(8.1.5) If 420, B=O0, and a, b20, then aA+bB20. 

(8.1.6) If A=Band C2D, then A+C2=B+D. 

(8.1.7) If A=Band B=C, then A=C. 


492 Nonnegative matrices 8.1 Inequalities and generalities 493 


Exercise. Construct a matrix that is similar to lo ol and has no zero- 


entries. What is its spectral radius? Is it nonnegative? What does this 
show about the last part of Corollary (8.1.20)? 


Exercise. Show that if A, Be M, and 0< A< B, then p(A) < p(B). Hint: 
There is some a > 1 such that 0 < A < aA < B. The conclusion follows from , 
Corollary (8.1.19) if (A) #0, and from Corollary (8.1.20) applied to B` 
if p(A)=0. 


Exercise. Show that an irreducible matrix cannot have a zero row or a 
zero column. 


Since p(S~!AS) = p(A) whenever S is invertible, we may generalize the 
above theorem by introducing some free parameters. If S = diag (x;,...,,,) 
and if all x,>0, then S"'AS>O if A20. Applying Theorem (8.1.22) to 
S'AS = lax; x], we obtain the following more general result. 


8.1.26 Theorem. Let A e M, and suppose A >= 0. Then for any positive 


Since we shall soon have rather good upper bounds on the spectral 
vector x e C” we have 


radius of a nonnegative matrix, Theorem (8.1.18) will be useful in ob-- 


taining upper bounds on the spectral radius of an arbitrary matrix. poan pon 
min P > ajx, <p(A)< max 7 X a,x, (8.1.27) 
leisg Pp j=l Isis i j=l 
8.1.21 Lemma. Let A €e M, and suppose that A = 0. If the row sums o : and mut ane! 
Aare constant, then (A) = |[All... If the column sums of A are constant, - a , 
then (4) = Alh. min x, > SU < pA) < max x, > ay (8.1.28) 
l<yan is] X l<ysn ped Xi 


Proof: We know that p(A) < ||A|| for any matrix norm ||-{|, but if the ` 
row sums are constant, x=[1,..., 1] is an eigenvector with eigenvalu 
[Alland so p(A) = |[All... The statement for column sums follows fro 
applying the same argument to A7. O 


8.1.29 Corollary. Let A € M,,, let xe R”, and suppose that A>0 and 
x>0. If a, 8 =0 are such that ax < Ax < Bx, then a < p(A) <B.Ifax< 
Ax, then a < p(A); if Ax < Bx, then p(A) <B. 


8.1.22 Theorem. Let A € M, and suppose A=0. Then Proof: If ax < Ax, then wSminj<j<,x; D-1 a,x, We conclude that 


i n < p(A\< n 8.1.23 a < p(A) by the theorem. If ax < Ax, then there is some a’ >æ such that 
eren » a= (A) ak > a) (8.1. a'x S Ax. In this event, p(A)2a’>a, so p(A) >a. The upper bounds 
and are verified similarly. © 
n Ht 
min x ajj=p(A)s max x aij (8.1.24 Exercise. Complete the proof of Corollary (8.1.29). 
sjen i= sjan i= 


Proof: Let a=minj<j<, 471 a; and construct a new matrix B with 
A2zBz2Oand Xj- bj; =a for all i=1,2,.... For example, if &«=0, we 
set B=0, and if w>0, we could set bij = ij (®t =] a). By Lemma 
(8.1.21), p(B) =a, and p(B) <p(A) by Corollary (8.1.19). The upper 
bound is easily established in a similar fashion. The column sum bounds 
follow from applying the row sum bounds to A’. O 


8.1.30 Corollary. Let A € M, and suppose that A is nonnegative. If A 
has a positive eigenvector, then the corresponding eigenvalue is p(A); 
that is, if Ax =x and x >0 and A=0, then \= p(A). 


Proof: If x>0 and Ax=dx, then \=0O and \x<Ax<Xx. But then 
<= p(A) =) by Corollary (8.1.29). 0 


Exercise. Prove the upper bound assertions in the preceding result. 
8.1.31 Corollary. Let Ae M,, and suppose that A is nonnegative. If A 


8.1.25 Corollary. Let AeM,. If AZO and D"_,4;,>0 for all j= $ #8 4 Positive eigenvector, then 


1,2,...,m, then p(A)>0. In particular, p(A)>0if A>0 or if A is irre- 
ducible and nonnegative. 


, 1 r" , j 
p(A)=max min — È a,,x,=min max — $ a,x, (8.1.32) 
x>0 isisn Xj j=] x>0 isisa Xi j=l] 


494 Nonnegative matrices 8.2 Positive matrices 495 


that the assumption that A has a positive eigenvector cannot be omitted 
` by considering A= lo il: Discuss both of these results in the light of 


` Corollary (5.6.13). 
5, If Az0 has a positive eigenvector, show that A is similar to a non- 


“negative matrix all of whose row sums are constant. What is this con- 
stant? Hint: Use the remarks preceding Theorem (8.1.26). 


Exercise. Prove the preceding result. Use the positive eigenvector x i 
(8.1.27). 


8.1.33 Corollary. Let Ae M, and suppose that A is nonnegative. I 
A has a positive eigenvector x, then for all m=1,2,... and for all i 
1,...,” we have 

um) 6. We shall show in Section (8.4) that a nonnegative irreducible matrix 
Pa <| - must have a positive eigenvector. Show that a nonnegative matrix can 


_ have a positive eigenvector and be reducible. 


Max)<k<n Xk 


ae | 


n 
{m 
p(A)"s È aj 
maxX sksn Xk j 


j=l 


(8.1.34 


where A” = [ag]. In particular, if o(4)>0, the entries of [p(A) 1A] 
are uniformly bounded for m=1, 2,.... 


Joan and l 


MiNi sken Xk 


71, Let A=[a;;]—€M, be nonnegative and have a positive eigenvector 
x=[x,]. Use (8.1.33) to show that 


l 
(a) tay" | 


MIN} <k<n Xk . 
stert < max af” foreach i=1,2,...,7 
MaX;<k<n Xk 


Proof: If Ax = p(A)x, then A” x = p(A)x. If A= 0, then A” = 0 and w l<pen 
have n \/m 
(b) lim | 5 aig | =p(A) foreach i=1,2,...,n 


moe {| p=l 


n 

pA)" max x2 pAr = [Ax]; $ aP 
j=l 

We denote A” = [a£] for all m=1,2,.... 


leksan 


lsksn 


n 
> ( min x) D af” 
/ 8.2 Positive matrices 
for any i=1,2,...,. Since x >0, the asserted upper bound follows from 


division. Similarly, we have The theory of nonnegative matrices assumes its simplest and most ele- 


, gant form for positive matrices, and it is for this case that O. Perron 
p(A)™ min x,<p(A)"™x;=[A"x];= ¥ al” x; made the fundamental discoveries in 1907. 

Isksn j=l 
8.2.1 Lemma. Let Ae M, and suppose that A>0, Ax= dx, x #0, 
and |\| = (A). Then A|x| = p(A)|x| and |x| >0. 


n 
<( max xe) ¥ al” 
J 


laksanan 


for any i=1,...,n and the asserted lower bound follows from divisio 


since x>0. O Proof: Compute 


p(A)|x| = || |x] = |x] = |Ax] < |A] [x] = Ale 
so that y= A|x|—p(A)|x|=0. Since |x| 20 and |x| #0, we know from 
(8.1.14) that A|x|>0. Corollary (8.1.25) also guarantees that p(A) >0, 
so if y=0, we have A|x| = p(A)|x| and |x| =p(A)7'Alx|>0. If y 0, 
set z= A|x|>0 and apply (8.1.14) again: 


O0<Ay=Az—p(A)z or Az>p(A)z 


But then by Corollary (8.1.29) we have the absurdity p(A)> p(A). We 
conclude that y =0 and we are done. [J 


Problems 
1. If A=O and if A% >0 for some k, show that p(A)>0. 


2. Give an example of a 2-by-2 matrix A such that A 20, A is not posi 
tive, and A?>0. 


3. Suppose that A = 0 and A #0. If A has a positive eigenvector, sho 
that p(A)>0. 


4. If p(A)<1, we know that A” 0 as moo, If Az0 and if A ha 
a positive eigenvector, use Corollary (8.1.33) to show that |A”| 
p(A)"C(A) for all m=1,2,..., where C(A) is a constant matrix. Sho 


From this technical result we easily deduce the first principal result 
about positive matrices. 


496 Nonnegative matrices 


8.2.2 Theorem. Let Ac M, and suppose that A is positive. Then 
p(A)>0, p(A) is an eigenvalue of A, and there is a positive vector x such 


that Ax = p(A)x. 


Proof: There is an eigenvalue \ with |A| = p(A)>0 and an associated 
eigenvector x #0. By the lemma, the required vector is |x|. O 


Exercise. If A € M, and A>0, use Corollary (8.1.31) to show that 


n 
I g . 1 g 
p(A)=max min — È ax; =min max 7 È ajx 
x>0 lsisn “i pol x>0 lsisn Ai j=l 


By sharpening the statement of Lemma (8.2.1) slightly, we can im- 
prove our knowledge of the location of the eigenvalues of A. 


8.2.3 Lemma. Let A €M, and suppose that A>0, Ax=)x, x40, 
and |A| = (A). Then for some 0€R, e7''x = |x| >0. 


Proof: The hypothesis guarantees that |Ax|=|\x|=(A)|x|, and from 
Lemma (8.2.1) we know that A|x| = (A)|x| and |x| > 0. Combin- 
ing these two identities and the triangle inequality, we have, for each 


k=1,...,7, 


(A) | xx] = A] [XK] = [Axe] = 


i 
È akpXp 
p=l 


< Y Jaslo] = ap ol = AD a 
p=l p= 


Thus, equality must hold in the triangle inequality and hence the (non- 
zero) complex numbers &kpXp, p=1, ol must all lie on the same ry 
in the complex plane. If we denote their common argument by @, then 


—10 
i x>0. 
e`" arp Xp > 0 for all p=1,..., 7. But since all agp > 0 we have e 


8.2.4 Theorem. Let Ae M, and suppose A is positive. Then [A| < 
p(A) for every eigenvalue \ # p(A). 


Proof: By definition, |\|< (A) for all eigenvalues à of A. Suppose 
ik] = p(A) and Ax=)\x, x #0. By Lemma (8.2.3), w=e-"x>0 for 
some ĝe R, so Aw= dw. But then \= p(A) by Corollary (8.1.30). 0 


We now know that if A>0, then (A) is characterized as the eigen- 
value of strictly largest modulus; there are no others. The next result 


8.2 Positive matrices 497 


says that p(A) is an eigenvalue of geometric multiplicity 1; that is, the 
eigenspace corresponding to p(A) has dimension 1. In fact, we shall see 
shortly that the algebraic multiplicity is also 1. 


8.2.5 Theorem. Let A € M, and suppose that A>0 and that w and z 
are nonzero vectors such that Aw = p(A)w and Az = p(A)z. Then there 
exists some a € C such that w= az. 


Proof: By Lemma (8.2.3) there exist real numbers @, and 6, such that 
p=e~"iz>0 and q =e~"2w > 0. Set B = min,<;<,q;p;'' and define 
r =q —ßp. Notice that r = 0 and at least one coordinate of r is 0, so 
r is not a positive vector, But Ar = Aq — BAp = p(A)q — Bp(A)p = 
p(A)r, so if r #0, we know by (8.1.14) that r = p(A)~!4r > 0. Since this 
is not true, we conclude that r =0 and hence q=6p and w= Bel(2~%)z. 

g 


8.2.6 Corollary. Let AeéM, and suppose that A>0. There exists a 
unique vector x such that Ax = p(A)x, x >0, and X7x =l. 


Exercise. Prove Corollary (8.2.6). 


The unique normalized eigenvector characterized in Corollary (8.2.6) 
is often called the Perron vector of A; p(A) is often called the Perron 
root of A. Of course, A’ is a positive matrix if A is, so all the above 
results apply to A’ as well. The Perron vector of A’ is called the left 
Perron vector of A. 


Exercise. If AEM, and A>O, and if there is some xe C” such that 
x20, x#0, and Ax =x, show that x is a multiple of the Perron vector 
of A and that à = (A). 


We shall be interested in studying the behavior of powers A” as m —> oo 
because these powers occur in applications to numerical analysis and to 
the theory of Markov chains in probability. The next lemma isolates the 
requirements that are essential to the various limit theorems about non- 
negative matrices. Notice that all the hypotheses are met if A>0 and 
h= (A). 


8.2.7 Lemma. Let A EM, be given, let Xe C be given, and suppose x 
and y are vectors such that 


8.2 Positive matrices 499 


498 Nonnegative matrices 
M) Agee If we now invoke hypothesis (4) and take u=, this argument shows 
(2) A’y=hy; and that if w were a d eigenvector of A- XL, then it would also be a ) eigen- 
O x'y=1. vector of A. On the basis of hypothesis (5) we must then conclude that 


w=ax for some a #0. But then pw=)hw=(A—AL)w=(A-)AL)ax= 
adx—hax =0, which is impossible since } ~0 and w#0. This contra- 
diction establishes (g). Because of (£), we know either that p(A-dL)= 
|\x| for some eigenvalue \, of A or that p(A—dL) =0. Since we have 
ordered the eigenvalues of A by increasing modulus and |\,|=|\|= 
p(A), we know in either event from (g) that p(A—\L) < [An—1|. Thus 
the inequality in (h) follows directly from (7). Combining (h) with (e), 
we calculate easily that (A7'A—L)”™=(\7!A)"—L >0 as m >œ since 
p(\~'A—L) = p(A—L)/p(A) S|\n—1|/p(A) <1. The convergence rate 
in (j) is a direct consequence of Corollary (5.6.13) applied to the matrix 
\“'A—L with e chosen so that p(\7'A—L)+e<[|hp-1|/p(A)] +€< 
r<. O 


Define L = xy". Then 


(a) Lx =x and yL =y"; 

(b) L” =L for all m=1,2,...5 

(c) A”L =LA" =)"L for all m=1, 2.083 

(d) L(A—AL) =9; 

(e) (A-AL)"=A™—N"L for all m=1, 2, ...; and . 
(f£) every nonzero eigenvalue of A—XL is also an eigenvalue 


of A. 


If, in addition, we assume that 


(4) #0; and 


(5) is an eigenvalue of A with geometric multiplicity 1; Exercise. Supply the details in the proof of (a), (b), and (c) in the lemma. 


then we also have that 
(g) A is not an eigenvalue of A-dL; that is, M—(4—AL) 18 ` 
invertible. 


8.2.8 Theorem. Let A € M, and suppose that A >0. Then 
lim [p(A)7'A]”=L 


m -> OO 


Finally, if we assume that where L=xy", Ax =p(A)x, A’y=p(A)y, x>0, y>0, and xTy=1, 


(6) |Aj=(A)>0; and , 
(T) is the only eigenvalue of A with modulus p(A); Proof: The assumptions (1)-(7) of the lemma are met with \ = (A), x 

f Aas hijes s Dvn tl < a the Perron vector of A, and y= (x%z)7!z, where z is the Perron vector of 
and if we order the eigenvalues of A as [Mi] S^] S °°" = l^n A’. The conclusion follows from (i). O 


|\| = (A), then 
(h) p(A—XL) S|An-1] < 0(A); 
(i) ATA)” =L+O7!A-— L)" > Las mo; and | 
(j) for every r such that [Anile] <r <] there exis 
some C=C(r, A) such that |(\7'A)"—L]o<Cr™ fora 
m=1,2,.... 


8.2.9 Corollary. If Ae M, and A >0, then L=lim,, ,..[p(A)~'A]” is 
a positive matrix of rank 1. 


8.2.10 Theorem. If Ae M, and A>0, then p(A) is an eigenvalue of 
algebraic multiplicity 1; that is, (A) is a simple root of the characteristic 


Proof: Conclusions (a), (b), and (c) follow directly from assumption equation p4(t)=0. 


2), and (3); notice that (3) implies that both x and y are nonzer 
o Conclusion (d) follows from (b) and (c). Statement (e) may 
proved inductively using (b) and (c). If p #0 is an eigenvalue 8 
and if (A—\AL)w = pw for some w #0, then L(A—dL)w=0w= e 
and hence Lw =0. Thus, (A—\L)w = Aw = pW, SO p18 also an eigenvalu 


of A and (f) is proved. 


Proof: By the Schur triangularization theorem (2.3.1) we may write 
A= UAU*", where U is unitary, Ais an upper triangular matrix with main 
diagonal entries p,..., 9,41) +..) Any and p=p(A) is an eigenvalue of 
algebraic multiplicity k = 1; the eigenvalues \, all have modulus strictly 
less than p(A) for all i=k+1,...,7. But then 


500 Nonnegative matrices 8.2 Positive matrices 501 
L= lim [p(A)7'A]” 
Hi 00 
- - 8.2.12 Theorem (Ky Fa = 
m n). Let = 
l [b;;] € M, has nonnegative ne a M ate SUP aine me 55 
k A lies in the region l y eigenvalue of 
l n 
-U lim Ne ue U lec: |z~ay| = p(B)—b,) 
Hi-> 0O Pp 
Proof: We may assume that B >0, for if i 
0 , may consider B, =[b, ' 10 some entries of B are zero we 
An o(B)—b 0 B goel or e>0; B.>|A], and p(B.)—(bi +6) > 
p ~V as eU. By Perron’s theorem there is a iti 
i | that Bx = p(B)x, and hence positive vector x such 
1 n n 
* 2 aos D bijX, = p(B)x,—b,, x; forall i=1,2,...,n 
1 . Ji ji 
=U 0 U Thus, we have 
0 I 
T & |aij|x;< p(B)—b;, forall i=1,2,...,n 


Xi jel 
Ji 


The result follows from Corollary (6.1.6) with D=xX, O 


0 


where the diagonal entry 1 is repeated k times in the last two expressions. 
and the diagonal entry 0 is repeated n—k times in the last expression. - 
Since the upper triangular matrix in the last expression has rank at least ; 
k, and since L has rank 1, we conclude that k >1 is impossible. 0 


Part (f) of (8.2.11) guarantees that a certain limit exists, and (j) of 
(8.2.7) gives an upper bound on the rate of convergence 


llo(A)A]”— L]o < Cr” 


for some positive constant C i 
Dr , which depends on A and r, for any r such 


We now summarize the principal results obtained in this section fo 
positive matrices. 


8.2.11 Perron’s theorem. If Ac M, and A >Q, then 


(a) p(A)>90; 

(b) p(A) is an eigenvalue of A; 

(© There is an xe C” with x >0 and Ax = p(A)x; 

(d) (A) is an algebraically (and hence geometrically) simple eigen 
value of A; l 

(e) |A|<p(A) for every eigenvalue \# p(A), that is, p(A) is th 
unique eigenvalue of maximum modulus; and 

(f) [p(A)~!A]">L as moo, where L=xy’, Ax=p(A)x, A'y= 
p(A)y, x >0, y>0, and x'y=1. : 


Perron’s theorem has many applications. One elegant and useful ap- 
plication is to obtain an eigenvalue inclusion region for a matrix A 
terms of the spectral radius and main diagonal entries of a dominating 


nonnegative matrix. 


where \,,_; is a second largest modulus eigenvalue of A. Even if p{A) is 
known or easily estimated, it can be inconvenient or impossible to compute 
or estimate |d,,_;| in order to get a useful bound on the ratio |d,,_4|/ (A) 
In such a situation it can be useful to know an easily computed bound 
due to E. Hopf, which holds for any positive matrix A = [a, leM,: 
lAn- < M—p l 
P(A) ~ M+yu 
where M = max{a;;: i, j=1,2,...,m} and m=minfa,,:/, /=1,2, 1 A}. 


<l 


1. If A>0, if x is the Perron v if zi 
; ector of A, and if z is the P 
A’, show that x'z>0. erron vector of 


502 Nonnegative matrices ] 8.3 Nonnegative matrices 
: 503 


2. If AEM, is an upper triangular matrix with k nonzero main diagonal | BeM,, k>1, of rankk—1h 
entries, show that rank A2 k. Show by example that it is possible to have |  ueC*, then there is som sk the property that if Bu=0 for some 
the rank greater than k under these conditions. : eveC such that Bu =u. 
F ; , 
iacl we Hoo tony a unified treatment of a family of bounds that 
1 und mentioned in the | , , 
1— I . ae ast paragraph 

| i 1 f ah 0<a, <l 1 Gnd an onthe nL bibliography) see U. Rothblum and C. Tan “Upper 

— n the Maximum Modulus of Subdomi igen i eT 
. . . . = negative Matrices,” Li ominant Eigenvalues of Non- 
and compare with the conclusions reached in Section (8.0). 7 > Linear Algebra A 

p (8.0) l [Kel] chapter II, theorem 2. & ppl. 66 (1985), 45-86. See also 


with n>2 cities as 


3. Apply the results derived in this section to the matrix 


A= 


4. Consider the general intercity migration model 
described in Section (8.0). If all the migration coefficients a;j are positive, iq 8.3 

what is the asymptotic behavior of the population distribution pas } ` Nonnegative matrices 
m positive i is confronted in practice with nonnegative matrices that are not 
in the pvecedin essary to consider the extension of the theory developed 
are strictly positive. O; niot he case in which not all of the matrix entries 
taking suitable nae ne mien hope that this extension could be done by 
nately, quantities oh a this is the case for some results. But unfortu- 
tions n so nies as rank and dimension are not continuous func- 
results in Perron’s arguments are only partially applicable. The only 
tained in the following heorem generalize by taking limits are con- 


5. If A>0, describe in detail the asymptotic behavior of A” as m> ©. 
Hint: There are three cases: A” — 0, A" diverges, and A” converges to a 


positive matrix. Characterize and analyze each case. 


6. After Corollary (6.1.8) there was an exercise dealing with a 2-by-2 — 
positive matrix. Discuss this example in light of the exercise following 


Theorem (8.2.2). 


7. Let A, Be Mn and suppose that A> B>O. Use the “min max” charac- 
terization of p(B) to show that p(A) > p(B). Hint: Let x be the Perron 


vector of A so that Ax > Bx. 8.3.1 Theorem. If Ae M, and A=0, then p(A) is an eigenvalue of A 
, ue o 


and there is a nonnegati 

8. If A>0 and if x is the Perron vector of A, show that egative vector x20, x0, such that Ax = p(A)x. 

Proof: For 

ee vestor of P 0, define A(e)=[a;;+e]>0. Denote by x(e) the 

vectors I(E): (e), so x(e)>0 and £?=1x(e);=1. Since the set of 

there is a monoron a ontained in the compact set {x: xe C”, [xf $1) 

one decreasing sequence e ith lim, = 

: béz- with lim = 

such rhat lime zler) =x exists. Since x(e,)>0 for all k=1,2 “t r 
at x =lim, o X(ez)= 0; x=0 is impossible because T 


fl 
p(A)= È aijxj 
ij=l 


Recall that xı +--+: +Xn=1 by definition. 
show that the inverse matrix can- 
x Ais nonsingular, show that.. 


if A has exactly one nonzero . 
ated to a permutation 


9. If a positive matrix is nonsingular, 
not be nonnegative. If a nonnegative matri 
the inverse matrix can be nonnegative only 
entry in each column. How is such a matrix rel 


matrix? 


10. Provide the details for the following alternate proof of Theorem 


(8.2.10): If p = p(4) has algebraic multiplicity k>1and if yand x are the 


left and right Perron vectors of A, respectively, then there is a nonzero: 
Ty = y"(A—pl)z=0'2=0, 


vector z such that x =(A-— pl)z. But then y 
which is impossible since y!x>0. Hint: Since the geometric multiplicity 


of p is 1, the Jordan form of A—pl must have exactly one nilpotent 
block, which must have size at least 2. Show that any nilpotent matrix 


n n 
Y= lim $ x(e)i=1 


i=] k = 00 je] 


By Theorem (8 1 18) € = € z= mm A f = 
oi. 3 p(A( k)) = p(A( . 

1,2 oe sO he sequen f k D) aaa ( 

D i i í ( (e) P ) or all k 


decreasing sequence. Thu i 
‘ S, =] i 
But from the fact that p =liMk 0 p(A(ex)) exists and p= p(A). 


Ax= li , 
x jim A(ex)xex) = lim p(A(ex))x(€x) 


= lim p(A(ex)) lim x(e,) = px 
K -+ 00 k =æ 


504 Nonnegative matrices 


and the fact that x #0, we deduce that p is an eigenvalue of A. But then 
p=p(A), so it must be that p=p(A). O 


There is a generalization of part of the variational characterization | 
(8.1.31) of the spectral radius to general nonnegative matrices and non- I 


negative vectors, but the proof is quite different. 


8.3.2 Theorem. Let Ae M,, A=0, xeC”, x20, and x #0. If Ax= 
ax for some ae R, then p(A) =a. 


Proof: Let A =[a;,], let e >0, and define A (e€) = la; +e]. Then A(e) >0, 
so A(e) has a positive left Perron vector y(e); that is, y(e)A(e) 
p(A(e)) y(e)”. We are given that Ax—ax=0, so A(e)x— ax > Ax—axe 
0 and hence y(e)’[A(e)x—ax] =[p(A(e))—a] y(e)/x = 0. Since y(e)/x> 
0, we have p(A(e))—a@=0 for all e >0. But p(A(e)) > p(A) as € > 0, so 
we conclude that p(A)2za. O 


8.3.3 Corollary. If Ae M, and A 20, then 


. I 2 
p(A)=max min — Ð aj;x; 
x20 Isisn Xj j=l 
x#0 x;#0 


Proof: If A20, x=0, and x0, and if we choose 


Qij Xj 


n 
a=min $ 
x, #0 j=l Xi 
then Ax 2 ax and so a < p(A) by the theorem. But if we choose x to be 
the eigenvector whose existence is guaranteed by Theorem (8.3.1), then 
we see that this upper bound can be attained with a=p(A). O 


8.3 Nonnegative matrices 505 


Proof: Let y >0 be such that Aly = p(A)y and suppose that x = 0 is such 
that x #0 and Ax— p(A)x=0. Then 


y![Ax—p(A)x] = p(A)y™x—p(A) yx =0 
so it must be that Ax—p(A)x=0. O 


Without additional assumptions, we can go no further than Theorem 
(8.3.1) in generalizing Perron’s theorem (8.2.11) to nonnegative matrices. 

When A e M, and A= 0, the nonnegative eigenvalue p(A) is called the 
Perron root of A. Because an eigenvector associated with the Perron root 
of a nonnegative matrix is not necessarily uniquely determined, there is 
(unlike the situation when A is positive) no well-determined notion of 
“the Perron vector” for a general nonnegative matrix. For example, the 
nonnegative matrix A = I has every nonnegative vector as an eigenvector 
associated with the Perron root p(A) = 1. 


Problems 


1. Show by example that the items from Perron’s theorem (8.2.11) that 
are not included in Theorem (8.3.1) are not generally true of all non- 


negative matrices. Hint: Consider lo ol lo ih and [i ol- 


2. If A20 and Af >0 for some k= 1, show that A has a positive eigen- 


vector. 


3. If A=O has a nonnegative eigenvector x with r= 1 positive compo- 


nents and n—r zero components, show that by a permutation similarity, 
A can be brought into the form [a ph where Be M,, CEM, in-rs 


DeM,-_,; B, C, and D are nonnegative; and B has a positive eigen- 
vector. If r< n, conclude that A must be reducible. 


Exercise. Consider A = F s] and x= [ | to show that the upper bound 


of (8.1.29) need not hold if x is not a positive vector. Show that the 
“min max” characterization in (8.1.32) is also false in general. As the 
preceding result shows, however, the “max min” characterization does 
generalize, 


x#0, then Ax = p(A)x. 


5. Consider the matrix A = lo ‘| and the vector x =[1,2]’. Show that 


- Theorem (8.3.4) is not correct if we assume only that A =0. What are the 
eft and right Perron vectors of A? 


With an additional assumption, Theorem (8.3.2) can be strengthened. 
slightly to give some information about the vector x. 


6. If A=0, show that there exists a positive matrix B that commutes 
with A if and only if A has left and right eigenvectors, each of which is 
ositive. Hint: Let B=xy' if x and y are positive right and left eigen- 
ectors of A. Conversely, if x20 and Ax = p(A)x, consider BAx = 


8.3.4 Theorem. Let A e M, and A = 0, and suppose that A has a positi 
ABx = Bp(A)x > 0. 


left eigenvector. If x20, if x #0, and if Ax = p(A)x, then Ax = p(A) 


506 Nonnegative matrices 


7. If A=[a;;]€M, is nonnegative and tridiagonal, show that all the 
eigenvalues of A are real. Hint: First show that if all sub- and super- 


8.4 Irreducible nonnegative matrices 507 


to classical work of Dmitriev, Dynkin, and Karpelevich and to the non- 
negative inverse eigenvalue problem. The latter problem (of charac- 


diagonal entries are positive, then a positive diagonal matrix D may be 
found so that D~'AD is symmetric. Then show that 0 entries above or 
below the diagonal are inconsequential. 

8. Let a nonnegative Ae M, be given. Show that either A is irreducible 
or there exists a permutation matrix P such that 


A, 
PTAP = 


terizing the sets of complex numbers that can be spectra of nonnegative 
matrices) is unsolved. For more information about Problem 10, see 
J. Cohen, “Convexity of the Dominant Eigenvalue of an Essentially 
Nonnegative Matrix,” Proc. Amer. Math, Soc. 81 (1981), 657-658. See 
also Problem 15 in Section (8.4). 


* 8.4 Irreducible nonnegative matrices 
0 Ax It is a useful heuristic principle that if one can prove a result for matrices 
with no 0 entries, then the result often generalizes to irreducible matrices. 
We have had one instance of this principle in the extensions of the basic 
Geršgorin theorem in Chapter 6, and we shall now have another. The 
basic idea has already been proved in Theorem (6.2.24); we restate the 
relevant portion here. 


in which each A; is either irreducible or is the 1-by-1 zero matrix, i= 
1,...,k. This is called the irreducible normal form of A. Note that 
o(A) = U*., o(A;) and that the irreducible normal form of A is not 


necessarily unique. 


9. A matrix A=[a,;]e€M,(R) all of whose off-diagonal entries lij, 
iż j, are nonnegative is said to be essentially nonnegative. Show that if A 
is essentially nonnegative, then there is some \ >0 such that M +Az0. 
Use this observation and (8.3.1) to show that if A e M, is essentially non- 
negative, then A has a real eigenvalue r(A) (often called the dominant 
eigenvalue of A) with the property that r(A) 2 Re ); for every eigenvalue 
\; of A. Show that r(A) need not be the eigenvalue of A with largest 
modulus, but when A = 0, r(A) = p(A). Hint: The eigenvalues of M +A 
are A+)j. 


10. Theorem (8.1.18) says that if AEM, is nonnegative, then 
p(A+B)z= p(A) whenever Be M,, is also nonnegative; this is a sort of 
monotonicity result for the spectral radius. If A eM, is essentially non- 
negative (see Problem 9), show that A+D is essentially nonnegative for 
all diagonal matrices De M,(R). If A is a given essentially nonnegative 
matrix and D is allowed to vary in the class of real diagonal matrices, it is 
known that the dominant eigenvalue r(A + D) is a convex function of the 
diagonal entries of D. 


8.4.1 Lemma. Let Ae M, and suppose A = 0. Then A is irreducible if 
and only if (1+A)""!>0, 


Exercise. If Ae M,, show that A is irreducible if and only if A” is 
irreducible. 


For our purposes, we also need the following simple results. 


8.4.2 Lemma. Let Ae M, and let M, ..., Ay be the eigenvalues of A 
(including multiplicities). Then \,+1,...,\,+1 are the eigenvalues of 
[+A and p(1+A)<1+ (A). If A=0, then p(/+A)=1 + (A). 


Proof: If X€ o(A) has multiplicity K, then \ is a root of the characteristic 
equation p4(t) =det(¢#7—A) =0 of multiplicity k. But then \+1 is a root 
of p4+s(s) = det[sf—(A+J/)] =0 of multiplicity k because det(t7-A)= 
det[(¢+1)J—(A+J)]. Thus, \;+1,...,, +1 are the eigenvalues of A +7, 
Therefore, p(7+A) = max)<j<,|A;+1| < MaX sisan Alt =1+4 (A). 
However, by (8.3.1), 1+ (A) is an eigenvalue of J+A when A = 0, so 
that p(1+A)=1+ (A) inthis case. O 


Further Readings. See C. Johnson, R. Kellogg, and A. Stephens, “Com, 
plex Eigenvalues of a Nonnegative Matrix with a Specified Graph II,” 
Lin. Multilin. Alg. 7 (1979), 129-143, and C. Johnson, “Row Stochas 
Matrices Similar to Doubly Stochastic Matrices,” Lin. Multilin. A 
10 (1981), 113-130 for results on the important topic of eigenval 


; Exercise, Explain why the following argument in support of the first part 
possibilities cf nonnegative matrices. These papers include referenc 


of the preceding lemma is incomplete: If \ is an eigenvalue of A, then 


508 Nonnegative matrices 


there is some vector x #0 such that Ax = Ax. But then (A+/)x =(A+1)x, 
so \+1 is an eigenvalue of A +7. 


8.4 Irreducible nonnegative matrices 509 


Since an irreducible nonnegative matrix has a positive eigenvector, the 
results at the end of Section (8.1) apply to this class of matrices. Of par- 


ticular importance is the variational characterization (8.1.32) of the spec- 
tral radius. Moreover, A” is irreducible if and only if A is irreducible, so 
an irreducible nonnegative matrix also has a positive left eigenvector 
Thus, Theorem (8.3.4) holds for nonnegative irreducible matrices This 
fact is crucial in the following extension of Theorem (8.1.18). . 


8.4.3 Lemma. If Ae M,, A20, and A* >0 for some k = 1, then p(A) 
is an algebraically simple eigenvalue of A. 


Proof: If \4,..., An are the eigenvalues of A, then Mowe, a are the eigen- 
values of A*. We know that p(A) is an eigenvalue of A by Theorem 
(8.3.1), so if p(A) were a multiple eigenvalue of A, then p(A)* = plA“) 
would be a multiple eigenvalue of A*, But this is impossible since p(A*) 
is a simple eigenvalue of A* by Theorem (8.2.10). O 


8.4.5 Theorem. Let A, Be M,,. Assume that A is nonnegative and irre- 
ducible, and assume that A = |B]. Then p(A) = p(B). If P(A) = p(B) and 
if \=e'*p(B) is an eigenvalue of B, then there exist 6,,...,0,@R such 


| a a l 
Now we shall see how much of Perron’s theorem generalizes to non- that B=e'*DAD~', where D =diag(e"",... , en), 


negative irreducible matrices. The name of Frobenius is associated with 
generalizations of Perron’s results about positive matrices to nonnegative 


matrices. 


Proof: From Theorem (8.1.18) we know already that if A> |B|, then 
p(A) = p(B). If p(A)= p(B), then there exists some xX #0 such that 
Bx =)x with |\| = p(B) = p(A), and so 


p(A) |x| = [àx] = |Bx| = |B| |x| = A]x| 


Since A is irreducible, we conclude from Theorem (8.3.4) that A|x| = 
p(A)|x| and hence |Bx|=|B| |x| =Alx|. Furthermore, (c) and (d) of 
Theorem (8.4.4) imply that |x| >0 as well, and since |B| <A it follows 
from (8.1.15) and the fact that |B||x|=A|x| that |B] =A. If we define 
ER by ev k=xy/|x;|, K=1,...,n, if \=e'p(A), and if we set D= 
diag(e! tans On), then x= D|x| and hx = e'?p(A)D|x| = BD|x| = Bx. 
Thus, e~"*D~'BD|x| = p(A) |x| = Alx|. This identity, together with the 
fact that |x| >0and |e~'?D~'BD| = A, implies that e~""D~'BD=A. 0 


8.4.4 Theorem. Let A €e M, and suppose that A is irreducible and non- 
negative. Then 


(a) p(A)>0; 

(b) (A) is an eigenvalue of A; 

(c) There is a positive vector x such that Ax = p(A)x; and 

(d) (A) is an algebraically (and hence geometrically) simple eigen- 


value of A. 


Proof: Corollary (8.1.25) shows that (a) holds under conditions even 
weaker than irreducibility. Assertion (b) holds for all nonnegative ma- 
trices A by Theorem (8.3.1), which also guarantees that there is a non- 
negative vector x #0 such that Ax=p(A)x. But then ([+A)""!x= 
[1+ p(A)]"~!x, and since the matrix (1+A)"~! is positive by Lemma 
(8.4.1), we see that the vector (1+ A)" !x must be positive by (8.1.14). 
Thus, x =[1+ p(A)]!~-"(1+A)"~'x > 0. To prove (d) we apply Lemma 
(8.4.2) to show that if p(A) is a multiple eigenvalue of A, then 1+ p(A) = 
p(T + A) is a multiple eigenvalue of 7+ A. But 7+ A=0 and (+A)! > 
0 by Lemma (8.4.1), so 1+ (A) must be a simple eigenvalue of [+ A by 


Lemma (8.4.3). O 


The theorem guarantees that the eigenspace of an irreducible non 
negative matrix associated with the Perron root is one-dimensional. For | 
an irreducible nonnegative matrix, the unique positive eigenvector whose : 
components sum to 1 is called the Perron vector. 


Exercise. Supply the details for the last i 
St part of the preceding proof. Hint: 
Let C=e~'*D~'BD and observe that Ep ” 


Aļx|=C]x|=|Cļx||<|C||x]=4A]x| 


so that equality holds in the triangle i i 
gle inequality, arg(c;;|x;|)=co 
e20, and ¢y ay, & (ci; |x;|) = constant, 


. When A >0, we know from Perron’s theorem that p(A) is the unique 
eigenvalue of A of largest modulus. When 4 >0 there may be more than 
one eigenvalue of maximum modulus, but in this case A must have a spe- 
cial form and these eigenvalues must be located in a very regular pattern. 


8.4.6 Corollary. Let Ae M,,, suppose A is nonnegative and irreduc- 
ible, and suppose the set S= fd, =p(A), Mn-treeesAn—441} Of eigen- 


510 Nonnegative matrices 


values of maximum modulus p(A) has exactly k distinct elements. Then 
each eigenvalue \,;€ S has algebraic multiplicity 1 and 


S={e2™?/k 5(4): p=0,1,...,k-1} 


that is, these maximum modulus eigenvalues are precisely the k kth roots 
of unity times p(A). Moreover, if A is any eigenvalue of A, then e A 
is an eigenvalue for all p=0,1,...,k—1. 


Proof: For each eigenvalue in S, write Ag- p = e”rp(A), p=0,1, or k-1, 
that is, pp =arg(\,—p). Assume that k >1, relabel the eigenvalues if neces- 
sary, and redefine the arguments if necessary so that O= p <p 
-< k- <2r. Applying the preceding theorem with B=A and a = 
n-p, We find that A = B = er D, AD, | for p=0,1,..., k=l. Since 
D, AD,,' is similar to A, it has the same eigenvalues as A, so this identity 
shows that the set of eigenvalues of A is carried into itself if it is rotated in 
the complex plane by the angle øp for any p=0,1,...,k—1; this is the last 
assertion above (provided we show that y,=2ap/k). Furthermore, 
Ana = P(A) is known to be an algebraically simple eigenvalue of A (since A 
is irreducible), so letting p=1,2,...,k—1 in succession, we conclude that 
all \,,—p are algebraically simple as well. 

Slightly more can be said, however. Since S= {An Ani; An—k+1} = 
fei Np, CPP Ny 15 «009 OP An Kat} for each p=0,1, vies k —1, there must be 
some g=q(p) such that p(A) =), =e'*?)g; that is, for each p there is 
some q =q(p) such that gp =2r — pq [1.-€., Gp = —¥q (mod 27)], and sO 
e~'’pp(A) eS. Moreover, if we iterate the representation for A given by 
Theorem (8.4.5) and take B=e'*'D, AD, ' and X=\y,-» =e?" (A) for 
any choice of r, m with O<r,m<k-—1, we find that 


Az=e'*'D,{e!’™D, AD YD, | = elm D, Dm AD, Dm) 


so that, by the same argument as above, the set of eigenvalues of A is 
rotated into itself by a rotation of angle gm + ø, in the complex plane. In 
particular, \,e/(¢m*t#r) = e!(¥m**)o(4) must be an eigenvalue of A (of 
maximum modulus), so for some j= j(m,r) we must have 9,,+¢, = %; 
mod2z). l 
Condi ter the set G = {go =0, 9, -s Øn-k+1} C [0, 2m). The preceding 
paragraph contains the information that (a) 0 € G; (b) if gi, e € G, then 
yi ty; (mod 2m) e€ G; (c) if yie G, then —g; (mod 27) € G. Moreover, it 
is clear that (d) if 9;, ¢;€G, then 9;+9;=¢;+¢; (mod 2m). Thus, G is 
an Abelian group with exactly k elements; the group operation is “addi- 
tion modulo 27.” Since the order of any element of a finite Abelian group 


8.4 Irreducible nonnegative matrices 511 


must divide the order of the group, every e’*” must be a pth root of 
unity, where p= p(m) is a divisor of k. We can prove this, and more, 
with a direct argument that makes no use of group theory. 

Since 9, + ¢, = gp; (mod 2r) for some j for each r and m, by induction 
we find that (setting r=m=1) rg,eG for all r=0,1,2,... (mod 27); 
that is, e’"*1p(A) eS for all r=0,1,2,.... But if e! were not a root of 
unity, this would imply that there are infinitely many distinct elements 
of S, which is absurd. Thus, e’’’=1 for some p with l<psk; we 
may assume that p is the smallest such integer for which this is correct. 
Recall that O< p< p< <,_44;<22, and for some fixed index 
m consider øp. The interval [0, 27). is divided into exactly p half- 
open subintervals (open at the right) of length 2r/p by the p+1 points 
0, 41,291,-.-,(D—1)¢), Pe; =27; the point øp must lie in one of these 
subintervals. Thus, there is some q with 0 <q < p—1 such that qe, S 
Ym <(q+1)y; that is, 0 < gm — qy; < pı. Therefore, we must have 
Ym — qe, = 9; for some j = j(m) because we have already shown that 
if e’*!p(A) is an eigenvalue, then so are e~'*ip(A), e~'%19(A), and 
e` tiemo( A). But then 0 < gm -qy = p; < pı and p; was chosen to be 
the least nonzero argument, so we must have ¢,, —qy,=0. This shows 
that every argument ø; is some multiple of gı, so it must be that p=k; 
that is, pı = 27/k, since if p < k, there would be fewer than & distinct ele- 
ments in the set {e1 p(A), e*'*!p(A), e*"!p(A), ...} which, nevertheless, 
must equal all of S. Finally, since each g, is some multiple of pı = 2r/k 
and since there are k distinct ø; terms and k distinct multiples of pı, it 
must be that gm = mg; for all m=0,1,2,...,k—1. 

The entire argument has proceeded on the assumption that k >1, but 
if k =1 the assertions made are trivial. O 


8.4.7 Remark. If A = 0 is irreducible and has k > 1 eigenvalues of max- 
imum modulus, then each nonzero eigenvalue of A lies on a circle cen- 
tered at 0 in C that passes through exactly k eigenvalues of A, all equally 
spaced around the circle. In particular, k must be a divisor of the number 
of nonzero eigenvalues of A. Thus, if A is a nonsingular n-by-n nonnega- 
tive irreducible matrix with n a prime, there must be either one or n 
eigenvalues of maximum modulus; there are no other possibilities, 


8.4.8 Corollary. Suppose A €e M, is nonnegative and irreducible, and 
denote A” = [a/”)] for m =1,2,.... If there are precisely k > 1 eigen- 
values of A of maximum modulus, then a”) = 0 for all i =1,2,...,7 
whenever m is not an integral multiple of k. In particular, all a; = 0. 


512 Nonnegative matrices 


Proof: Use Corollary (8.4.6) to choose an eigenvalue N= e'*p(A) of A 
of maximum modulus with yg =272/k. Thus, e”? is not real and posi- 
tive whenever m is not an integral multiple of k. Using Theorem (8.4.5) 
with B = A and ) = e'¥p(A), we find that A = e'*DAD™, so A” = 
e™° DA”D™! and a” =e™*a™ for alli=1,...,n and all m=1, 2,3, .... 


8.4 Irreducible nonnegative matrices 513 


Proof: If A is irreducible, then the assertion follows from Corollary 
(8.4.6). If A is not irreducible, then by a permutation similarity, A can be 
brought into the block upper triangular form 


A; * 


Az 
0 


If e?” is not real and positive, this is impossible if ap” >0, so we must 
have a”) =0 for all i=1,..., whenever m is not a multiple of k. 

i 4, 
Exercise. Suppose that A €e M, is nonnegative and irreducible. Show that 
in order to guarantee that p(A) is the unique eigenvalue of A of max- 
imum modulus, it is sufficient to have some a; #0. However, consider 
the matrix 


where each A; is a square matrix that is either irreducible or zero. The 
eigenvalues of A are the union of the eigenvalues of the diagonal block 
matrices A,,...,A,, and the structure of the set of maximum modulus 
eigenvalues of each A j is given by Corollary (8.4.8). O 

1 J 
0 1 
1 0 


, Exercise. Let 
i 01 0 1 0 

3-by.2 A, = 10 and A,=|0 0 1 
and show that this condition is not necessary. Can you find a 2-by- 1 0 0 


9 * 
counterexample? and consider 


8.4.9 Remark. A result sharper than Corollary (8.4.8) is ne If A = 

is i i i has k>1 eigenvalues of maximum modulus, then | | 

is wets a permutation matrix P such that to show that for general nonnegative A there can be more eigenvalues of 
ere is 


maximum modulus than just p(A) and a single set of rotations of p(A) 


0 Ap 0 through powers of a single root of unity. 
PAP’ = : 0 ` Agent Problems 
A 0 0 1. Show by examples that the items in Perron’s theorem (8.2.1 1) that are 
k, | one 


not included in Theorem (8.4.4) are not generally true of irreducible non- 
where the k main diagonal zero blocks are square and the blocks A, negative matrices. 
shown are not necessarily zero. In particular, all the main diagona 
entries a;; must vanish [Var, p. 28]. a 

Although the hypothesis of irreducibility is essential to get the regula 
pattern of maximum modulus eigenvalues described in Corollary (8.4.6) 


one can get some information in the general case. 


2. Show by example that p(I+ A) =1+ p(A) is not true for all A eM, 
Give a necessary and sufficient condition on A for this identity to be 
correct. 


3. Irreducibility is a sufficient but not necessary condition that a non- 


negative matrix have a positive eigenvector. Consider [ H | and [ a] to 
show that a reducible nonnegative matrix may or may not have a positive 


8.4.10 Corollary. If A €e M, and A =0, if p(A) > 0, and if à is an eigen- eigenvector. 


value of A such that |A| = (A), then \/p(A) =e" is a root of unity, 
et? =] for some k with |< k <n, and e° (A) is an eigenvalue of A for 


all p=0,1,2,...,k—1. 


4. If A= Ois irreducible, show that the entries of the matrices [p(A)~'4]” 
are uniformly bounded as m — oo. 


514 Nonnegative matrices 


5. We have shown that an irreducible matrix has a positive Perron 
vector. Suppose that A=0, p(A)>0, x20, x#0, and Ax=p(A)x; 
show that if x is not positive, then A is reducible. If x is positive, must A 


be irreducible? 


6. Suppose A = 0 is irreducible and that B= 0 commutes with A. If x is 
the Perron vector of A, show that Bx = p(B)x. Hint: Theorem (8.4.4d). 


7. Show that the companion matrix of the polynomial x*—1=0 is an 
example of a k-by-k nonnegative matrix with k eigenvalues of maximum 
modulus. Sketch the location of these eigenvalues in the complex plane. 


8. Let ki, k,..., Kp be given positive integers. Show how to construct a 
single nonnegative matrix whose eigenvalues of maximum modulus are 
precisely the kı (kı)th roots of unity, the kz (k2)th roots of unity, ..., and 
the k, (kp)th roots of unity. 


9. Explain why an irreducible nonnegative matrix A is said to be cyclic 
of index k if it has k= 1 eigenvalues of maximum modulus. 


10. If A20 is irreducible and is cyclic of index k21, show that the 
characteristic polynomial pa (t) = tt" — p(A)*)(t* —p5) (th — pi) for 
some r,m=0 and some complex numbers p; with |p;|<p(A), i= 
2,...,m. Comment on the pattern of zero and nonzero coefficients in 
pa(t) and give a criterion for A to have only one eigenvalue of maximum 
modulus based on the form of the characteristic polynomial. Hint: In the 
proof of Corollary (8.4.6) we found that if ¢ =27/k and if ) is an eigen- 
value of A, then so are e°), r=0,1,2,.... 

11. Let n>1 be a prime number. Show that if Ae M, is nonnegative, 


irreducible, and nonsingular, either p(A) is the only eigenvalue of A of 
maximum modulus or all the eigenvalues of A have maximum modulus. 


12. Consider A= E | and show that the conclusion of Corollary 


(8.4.8) cannot be improved in general to assert that the main diagonals of 
all powers of A must vanish. 

13. If A=0, show that irreducibility of A depends only the location of 
the zero entries and not on the magnitude of the nonzero entries. 

14. If A,BeM,, then AB and BA have the same set of eigenvalues. 
Consider lo | and 9 | to show that even if A and B are nonnegative, 
it is possible to have AB irreducible and BA reducible. This example 


shows that an irreducible matrix can be similar (even unitarily equiva- 
lent) to a reducible matrix; explain why. It also shows that no condition 


8.5 Primitive matrices 515 


involving only the eigenvalues of a i iti 
ng ¢ matrix can be a defini 
irreducibility. mye test for 


15. Let A € M, be a given irreducible nonnegative matrix. Show that 
A+B is irreducible whenever Be Mhn is nonnegative, and that p(A+B)> 
p(A) whenever B20 and B0. This is an improvement of (8.1.18) to 
strict monotonicity, but with the additional assumption of irreducibility, 


Hint: (8.1.18) says that p(A+B) = p(A). If . 
show that B=0. p(A). If equality holds, use (8.4.5) to 


16. Show that (8.4.1) can be sharpened in the following way. Let Ac M 
be nonnegative and let the minimal polynomial of A have degree m. 
Show that A is irreducible if and only if (1+A)""!'>0. Hint: Con- 
sider 1+ A+ A? +++» +A"! 4.44... 4.4"~! and use the minimal poly- 
nomial to express A” and higher powers in terms of I,A,...,A”~! ° 


17. Let Ae M,, be a given nonnegative matrix and consider the problem 
of finding a best rank 1 approximation to A in the sense of least squares; 
that is, find a rank 1 X € M, such that |A—X|2=min{J|A—Y]): YeM, 
has rank 1}. Suppose that the Perron root of AAT is simple which would 
be the case if either AA’ or ATA is irreducible. Why? Show that such a 
best x is nonnegative, unique, and given by X=yrouw’, where r= 
p(AA’) is the Perron root of AAT, and v, we R” are nonnegative unit 
vectors that are, respectively, unit eigenvectors of AA’ and ATA asso- 
ciated with the eigenvalue r, Hint: Use the characterization of a best rank 
1 approximation given in (7.4.1). Notice that AA’ and ATA are both real 
symmetric positive semidefinite matrices, so the computation of r, v, and 
w is, in principle, not too difficult. -_ 


18. Use Problem 17 to find a best rank 1 | , 
east-squares approximati 
each of the matrices q pproximation to 


A= | 1 1 l 1 1 0 0 
1 1 0 1)’ 1 1 
Show that a best rank 1 least-squares approximation to A = J € M, is not 


unique; X = vv* is a best rank 1 least-s i i 
i -squares approximation to J f. 
unit vector v e C”. Or any 


8.5 Primitive matrices 


In practice, the result in Perron’s theorem that may have the most fre- 


_ quent application is the limit statement in Theorem (8.2.8). An examina- 


tion of Theorem (8.4.4) shows that the only hypothesis lacking for an 


516 Nonnegative matrices 


application of Lemma (8.2.7) to irreducible matrices is the condition 
that the spectral radius is the only eigenvalue of maximum modulus. 


Since A= [9 | is an example of a nonnegative irreducible matrix with 


two eigenvalues of maximum modulus (and for which lim, o A” does 
not exist), some further restriction of the class of irreducible matrices is 
necessary; the most economical procedure is to assume exactly what we 


need. 


8.5.0 Definition. A nonnegative matrix A € M, is said to be primitive 

if it is irreducible and has only one eigenvalue of maximum modulus. 
The notion of primitivity is due to Frobenius (1912). The limit result 

now follows directly from Lemma (8.2.7) with the same proof as for 


Theorem (8.2.8). 


8.5.1 Theorem. If A e M, is nonnegative and primitive, then 


lim [p(A)7'AJ”=L>0 
where L=xy’, Ax=p(A)x, A’y=p(A)y, x>0, y>0, and x'y=1. 
Moreover, if \,,—; is an eigenvalue of A such that |\|<|\,-1| for every 
eigenvalue \# (A), and if |d,~1|//e(A)<r<l, then there exists a 
constant C=C(r,A) such that |[p(A)7'A]"—Llo=Cr” for all m 
1,2,... 0 


We have now generalized all of Perron’s theorem from the class of 
positive matrices to the class of primitive nonnegative matrices. In prac- 
tice, however, one still faces the problem of testing a given nonnegative 
matrix for primitivity; ideally, one would hope to be able to do so with- 
out explicit calculation of the eigenvalues. The following characteriza- 
tion of primitivity, while not itself a computationally effective test, leads 


to several useful criteria. 


8.5.2 Theorem. If Ac M, is nonnegative, then A is primitive if and 
only if A” >0 for some m2 1. 


Proof: lf A > 0 and A” > 0, then from every node P; of the directed graph 
I'(A) of A to every other node P, there must be a directed path of exact 


length m (Corollary 6.2.18). Since this is a stronger property than irreduc- | 


ibility, A must be irreducible. Application of Perron’s theorem (8.2.1 1d, e) 
to A”, as in (8.4.3), then implies that A must be primitive. Conversely, if 


8.5 Primitive matrices 517 


Ais primitive, then lim» «0[p(A) 'A]"=L>0 by Theorem (8.5.1), and 
so for some m= 1 it must be that [p(A)~'A]”>0. 0 


This characterization together with the very sharp information we 
have about the maximum modulus eigenvalues of nonnegative irreduc- 
ible matrices now gives us a graphical criterion for primitivity reminiscent 
of the graphical criterion for irreducibility. Recall that the greatest 
common divisor (g.c.d.) of a sequence of positive integers ki, ky, ... is the 
largest integer k = 1 such that & is a divisor of all ky, ko, 0... 


8.5.3 Theorem. Let AEM, be nonnegative and irreducible, and let 
LP; denote the set of nodes of the directed graph F(A). Denote by 
Li ={ky’, k2’,...} the set of lengths of all directed paths in T(A) that 
both start and end at the node P,;, i= 1,2,...,. Denote by g; the greatest 
common divisor of all the lengths in L;. Then A is primitive if and only if 
all g;=1, (=1,2,...,n. 


Proof: Observe that no set of lengths L; is empty since A is irreducible: 
for each i and for any j #i there is a path in P(A) joining P; to P; and 
there is also a path in (A) joining P; to P;. If A is primitive then by 
Theorem (8.5.2) there is some m = 1 such that A” > 0, and hence A‘ >0 
for all k = m. But then m, m +1, m+2,...EL; for alli =1 n, and 
hence g;=1 for all i=1,..., n. oe 
Now suppose A = [a;;] is not primitive. If A has exactly k >1 eigenval- 
ues of maximum modulus, then by Corollary (8.4.8) we know that a”) = 
0 for all /=1,..., and for all m such that m is not an integral multiple of 
k. Thus, L; C {k, 2k, 3k, ...}, and hence 8 2k>I1foralli=i,...,n. O 


8.5.4 Remark. Somewhat more than the assertions in Theorem (8.5.3) 
is true; in fact, gı = g8, = --; =g, always, and the common value of the g; 
terms is precisely the number of eigenvalues of A of maximum modulus. 
This is a theorem of Romanovsky. 

The following result is useful in many situations; in particular, it 
shows that an irreducible nonnegative matrix with positive main diagonal 
must be primitive. 


8.5.5 l Lemma. If A € Mn is nonnegative and irreducible, and if all the 
main diagonal entries of A are positive, then A”~'>0, 


Proof: If «= min{aj;, a2, ...,@,,}, and if we define 


B=A— diag (ay), 22, ..., Un) 


518 Nonnegative matrices 


then B is nonnegative and irreducible (because A is irreducible), and 
Azal+B=al[I+(1/a)B) and hence A”~'> a"~![I+(1/a)B]"~'!>0 by 
Lemma (8.4.1). © 


Exercise. Note that as a nonnegative square matrix with positive diag- 
onal entries is powered, any entry that becomes positive remains positive 


in all successive powers. 


Although an irreducible matrix may have a reducible power, all powers 
of a primitive matrix are primitive. 


8.5.6 Lemma. Let Ae M, be nonnegative and primitive. Then A’ is 
nonnegative, irreducible, and primitive for all k=1,2,.... 


Proof: Since all sufficiently large powers of A are positive, the same is 
true for A* for any k. If A* were reducible for some K, then all powers of 
A* would also be reducible and hence could not be positive. Since this 
contradicts the fact that all sufficiently large powers of A are positive, it is 
impossible for any power of A to be reducible. O 


The characterization in Theorem (8.5.2) is not in itself a computa- 
tionally effective test for primitivity since no upper bound on the powers 


to be computed is given. If one finds an m such that A” >0, then A is 
primitive; but when does one stop computing if one has not yet found | 
a positive power? A finite bound which answers this question is given by - 


the following theorem. 


8.5.7 Theorem. Let Ae M, be nonnegative. If A is primitive, then 
A* >0 for some positive integer k<(n—1)n”. 


Proof: Because A is irreducible, there is a directed path from the node P 
in (A) back to node P}; the shortest such path has length kı =n. The 
matrix A“! therefore has a positive entry in its 1,1 position, and any 
power of A“! will also have a positive 1,1 entry. Because A is primitive, 
A“ must be irreducible by Lemma (8.5.6), and so there is a directed path 
from the node P; in '(A*“!) back to the node Pa; the shortest such path 
has length kz sn. The matrix (A*!)*2 = A*!*2 therefore has positive 1, 
and 2,2 entries. This process can be continued down the main diagonal 


a 


until we obtain a matrix A152 ¥n (with each k; <n), which is irreduc- 
ible and has positive diagonal entries, and hence [A*142°""4n}"~! 5. by 


Lemma (8.5.5). Since 


8.5 Primitive matrices $19 
kika "e ka(n=1)sn-n--n(n—=1)=n"(n=1) 
we are done. 0O 


If A is a given primitive matrix, the least k such that A‘ >0 is called 
the index of primitivity of A and is usually denoted by (A). We have 
seen that y(A)<n—1lif A hasa positive diagonal and y(A) < n"(n—1) in 
general. The latter bound can be improved considerably. 


8.5.8 Theorem. Let A eM, be a nonnegative primitive matrix, and 
suppose the shortest simple directed cycle in I'(A) has length s. Then 
A"ts"~2) 50. that is, y(A) Sn+s(n—2), 


Proof: Because A is irreducible, every node in r(A) lies on a cycle and the 
shortest cycle from any node back to itself will be a simple cycle of length 
at most n. By a permutation, we may assume that the nodes in the shortest 
such cycle are P}, P, ...,P,. Notice that n+s(n—2)=n-—s+s(n—1) 
and consider A”~**5("—)) — 4"-s(45)"—1_ Write 4”~5 in block form 
AMS = p w 
Xa Xz 
with Xu €M; and Xz € Mps. Then X u has at least one nonzero entry 
in each row because the nodes Pi, ..., P, comprise a cycle in T (A) and 
hence from each node P; in the graph I'(A,,_,) there is some arc to some 
node P; (perhaps P; = P;) in '(A"~); this is correct if 1 si, js. There 
is at least one nonzero entry in each row of Y. 21 because for each node 
P5415 +++, Pa not in the cycle there must be a directed path in r(A) of 


length not more than n —s (the number of nodes not in the cycle) to some 
- node in the cycle. By then going a sufficient number of additional steps 
. around the cycle, it is clear that there is a directed path of length exactly 
= n~—s in T (A) from every node not in the cycle to some node in the cycle. 


Now write (A°)”~! in block form as 


Yi, | 


(AS)"~! = | 
Yo, Y» 


where Y, € M, and Y22€ Mn-s. Because P}, < P, comprise a cycle in 
r(A), there is a loop at each node Pis... 2s in T (A5). Since A is primi- 
tive, A" is also primitive, and hence is irreducible. From each node 
Pis., Ps in T (45) there is a path in r(4°) of length at most n—1 to any 
other node, By first going a sufficient number of times around the loop at 
` the starting node, we can always construct such a path of length exactly 
n~1. This shows that Y,;>0 and Yp >0. 


520 Nonnegative matrices 


To complete the argument we compute 

Xu X21[ Yn Via l ~ H Yı 
Xa X72 || Ya Yn Xz Yı 
Because each row of the X blocks in the last expression contains at least 


one nonzero entry, and because each of the Y blocks in the last expression 
a * * s ns 2 ame 
is positive, the entire block matrix is positive and A” -°(A*)""'>0. O 


nye 


A” S( As)" ! = | Xa Y 


One consequence of (8.5.8) is a celebrated result of H. Wielandt, 
which gives a sharp upper bound for the index of primitivity of a general 


primitive matrix. 


8.5.9 Corollary. If A e M, is a nonnegative matrix, then A is primitive | 


if and only if A”°~2"+2>0, 


Proof: If any power of A is positive, then A is primitive, so only the con- 
verse implication is of interest. If n =1, the result is trivial, so assume 
n> 1. If A is primitive, then it is irreducible and there are cycles in T (A). 
If the shortest cycle in (A) had length n, then the length of every cycle 
in T (4) is a multiple of n and hence A could not be primitive by jecoren 
(8.5.3). Thus, the length of the shortest cycle in T (A) is n—1 or less, an 
hence by Theorem (8.5.8) we have 


y(A) sn+s(n—2)<n+(n—1)(n—2) =n’?—2n4+2 g 


Wielandt gave an example (see Problem 4 at the end of this section) to 
show that the bound y(A)<n?—2n+2 is the best possible bound for 
matrices that have all diagonal entries 0. We know that if all main diag- 
onal entries are positive, then A is primitive if and only if A" >0. The 
following result of Holladay and Varga uses the same ideas employed in 
the proof of Theorem (8.5.8) to provide a bound on the index of primi- 
tivity if some, but perhaps not all, of the main diagonal entries are 


positive. 


8.5.10 Theorem. Let A e M, be nonnegative and irreducible, and sup- 
pose A has d positive main diagonal entries, 1 < d <n. Then A > 0; 


that is, y(A) s2n—d—-1. 


Proof: Under the hypotheses stated, A must be primitive, and the min- 
imum length cycle in T (A) has length 1. In fact, there are d such cycles. 


By a permutation, we may assume that P;,..., P, are the nodes in (A) 


-d — -1 . 
that have loops. Consider A7” 747! = A”~%(4!)"~'! and write 


8.5 Primitive matrices §21 


arta [a x | arta | i | 
Xa Xz Yo, Yoo 


where Xi, Y € M4 and Xx, 2€ Mpa. The same arguments used in 
the proof of Theorem (8.5.8) to treat the correspondingly placed blocks 
of A” and (A*)"~! show that each row of the blocks Xj, and Xə, 
contains at least one nonzero entry, and the blocks Y, and Y are 
positive. It follows that the product A”~%4"~! jg positive by the same 
reasoning used in the Theorem (8.5.8). O 


Exercise. Show that the matrix A = ro i ] is primitive. What are its eigen- 


values? Compute the bounds on y(A) given by (8.5.9) and (8.5.10). 
What is the exact value of y(A)? 


As a final remark, we note that if one wishes to verify that a given non- 
negative matrix is primitive, then one could check that the matrix is ir- 
reducible and that Wielandt’s condition (8.5.9) is met. Matrices arising 
in practice frequently have a special structure that makes it easy to see 
whether or not the associated directed graph is strongly connected. Fur- 
thermore, if the matrix is irreducible and any main diagonal entry is posi- 
tive, then it must be primitive. However, if the matrix is large and there is 
no special structure or symmetry to its entries, or if all the main diagonal 
entries are zero, then it may be necessary to use Lemma (8.4.1) or Cor- 
ollary (8.5.9) to check irreducibility or primitivity. In either case, the re- 
quired number of matrix multiplications will be considerably reduced if 
the matrix in question is squared repeatedly until the resulting power ex- 
ceeds the critical value (n—1 or n?—2n+2, respectively). For example, 
if n=10, then calculation of (1+ A)?, (I+A)*, +A)8, and (1+4) is 
sufficient to verify irreducibility; this is 4 matrix multiplications instead 
of the 8 required by a direct application of Lemma (8.4.1). Similarly, 
if A is nonnegative, then calculation of A’, A4, AS, Al, 432, 4 and 
A is sufficient to verify Primitivity; this is 7 matrix multiplications in- 
stead of 81. Note that we are making implicit use of Problem 3 in these 
considerations. 


Problems 
1. Write out the proof of Theorem (8.5.1). 


2. If A €M, is nonnegative and primitive, show that lim yy) fai”? JV" = 


(A) for all i, 7=1,...,n. Compare this result with Corollary (5.6.14). 
Can either part of the hypothesis of primitivity be omitted? 


522 Nonnegative matrices 


3. Show that if A=0Oand A‘ >0, then A” > 0 for all m2 k. If Ais prim- 
itive, show that At is primitive for any positive integer k. However, if A 
and B are both primitive, it could be that AB is not primitive. Hint: Con- 


sider |i | and [i il- 


4. Use T(A) to show that Wielandt’s matrix 


o 1 0 1 
0 1 

4al: KA H ceM, 
09 0 > 1 
110... O| 


L 


is irreducible and primitive for all n= 3. Then show that the (1, 1) entry 
of A” ~2"*! is zero but Ar?-2"42 50, Hint: Think of A as a linear trans- 


i -p.—>% 
formation acting on the standard basis {e),...,@n}. Then Ate? 
. >: -> 


5. Let Ae M, be nonnegative and irreducible. Show that A is primitive 
if at least one main diagonal entry is positive. Show that this sufficient 
condition is necessary for m= 2 but not for nz3. 


6. Let A = [a;i] E€ Mp be nonnegative, and suppose akk > 0 for some k= 
1,2,..., 7. Show that the k, k entry of every power of Ais also positive. If 
a, = 0 but the k, k entry of A’ is positive, is the k, k entry of A’ positive? 
7. Justify in detail the computational shortcuts suggested at the end of 
this section. 

8. If Ais any idempotent matrix, then A=lim, 0A”. Show that if Ais 
nonnegative, irreducible, and idempotent, then A is a positive matrix of 
rank 1. 

9. Give an example to show that lim,, »0[p(A)'A]” can exist even if 


AzO0 is not primitive. Indeed, A can be reducible and can also have 
multiple eigenvalues of maximum modulus. 


10. Prove the following partial converse of Theorem (8.5.1): If A€ Mn 
is nonnegative and irreducible, and if lim m — of p(A)'A]” exists, then A 
is primitive. Hint: If |a| = (A), ux p(A), and 4z = pz, z#0, then 
[p(A)'A]"z >? 


11. Show that A= E 5 | is irreducible, but A” is reducible. Does this ‘ 


10 
contradict (8.5.6)? 


8.5 Primitive matrices 523 


12. Give an example of an irreducible nonnegative matrix A €e M, such 
that lim,, ..0[p(A)7!A]” does not exist. 


13. If e>0 and if AEM, is nonnegative and irreducible, prove that 
Atel is primitive. 


14. A nonnegative matrix A = [a;;] is said to be combinatorially sym- 
metric provided that a;;>0 if and only if a;;>0 for all 7, /=1,...,7. 
Show that if A is combinatorially symmetric and primitive, then A””~* > 
0. Hint: Consider A? and use (8.5.6) and (8.5.10). Can you strengthen 
the bound for y(A), given more information about the cycle structure of 
r(A)? Hint; Use (8.5.8). 


15. Show that if Ae M, is nonnegative, irreducible, and nonsingular 
with n a prime number, then either (a) A is primitive or (b) all the eigen- 
values of A have maximum modulus and A is similar to the companion 
matrix of x"— p(A)"=0. 


16. One way to compute the Perron vector and spectral radius of a non- 
negative matrix A €e M, is the power method: 


n 
x is an arbitrary positive vector, Y xf=1 
i=] 


yl) = Ax” for all m=0, 1,2, wee 


(m+1) 
yom) y 


Dre yit” 


for all m=0,1,2,... 


If A is primitive, show that the sequence of vectors x” converges to the 
(right) Perron vector of A and that the sequence of numbers Xf- yt” 
converges to the Perron root of A. What is the rate of convergence? Is 
the hypothesis of primitivity necessary? 


17. If AEM, is nonnegative, show that primitivity of A depends only on 
the location of the zero entries and not on the magnitude of the nonzero 
entries. 


18. If AEM, is nonnegative, irreducible, and symmetric, show that A is 
primitive if and only if A+ (A)/ is nonsingular. In particular, this con- 
dition is met if A is positive semidefinite. Symmetric nonnegative matrices 
with 0’s and 1’s as entries arise naturally as adjacency matrices of undi- 
rected graphs. 


19. If Ae M, is primitive and k= (A), show that A* >0. 


524 Nonnegative matrices 
20. Provide the details for the proof of Theorem (8.5.10). 


21. Calculate the eigenvalues and eigenvectors of each of the following 

matrices and categorize them according to the key concepts of the 

chapter (nonnegative, irreducible, primitive, positive, and so forth): 
11 01 10 10 10 01 00 . 

[i i [i i [i a lo ih [o o [o of lo ol These provide a good 

illustration of the possibilities that can occur. 


22. In the proof of Theorem (8.5.8), show that each column of Xj; and 
X}2 contains at least one nonzero entry. Show that Y>,>0. 


Further Reading. For a proof of Romanovsky’s theorem mentioned in 
(8.5.4), see V. Romanovsky, “Recherches sur les Chaines de Markoff,” 
Acta Math. 66 (1936), 147-251. 


8.6 A general limit theorem 


Even if a nonnegative matrix A is irreducible, the normalized powers of 
A need have no limit, as the example 


0 1 
A= 
1 0 
easily shows. Nevertheless, there is a precise sense in which, on the aver- 
age, this limit does exist. 


8.6.1 Theorem. Let Ae M, be nonnegative and irreducible, let Ax = 
p(A)x, A’y = p(A)y, xly= 1, and L=xy’. Then 


1 N 
lim — A)TAJ"=L 
im 5, & [e(A) A] 


N = co m=] 


Moreover, there exists a finite positive constant C = C(A) such that 


Z maem 


o 


1 N 
ly X [o(4) A] -L 


N m=] 
for all N=1,2,.... 


Proof: If we set \= (A) and choose for y and x the left and right 
Perron vectors of A, respectively, then hypotheses (1)-(5) of Lemma 
(8.2.7) are satisfied and hence the matrix 


I—[o(A) A-L] =p(A)7![p(A)I—(A~p(A)L)] 


8.6 A general limit theorem 525 


is invertible. Using (e) of Lemma (8.2.7) and the identity in Problem | at 
the end of this section, we compute 


i y —] m 
N „> eA) A] 
i ` ~] im 1 N 
= 2 (eld) A-L] FE)VEL+ SS & A'A- 
1 
= L457 PA-L -IA A-LIM H-A) A-L] 


1 
=L+— (A) A-L U= [p(A) A" + LET [p(Ay A — Ly} 


The only part of the second term in this last expression that depends on 
Nis the factor 1/N and the term [o(A)~'A]) but the entries of the latter 
matrix are uniformly bounded as N > co by Corollary (8.1 33). Thus, the 
second term is of the order of 1/N as N> œ, and hence it tends uni- 
formly to zero. O 


An analysis of the hypotheses required by Lemma (8.2.7) and Corol- 
lary (8.1.33) shows that exactly the same argument proves the following 
more general (but less concisely stated) result 


. 


8.6.2 Theorem. Let A €M, be nonnegative, and let x and y be non- 
negative vectors such that Ax = (A)x and A’y = p(A)y. If 


(a) p(A)>0; 
(b) x7y>0; 
(c) the matrix 
T—[p(A)'A ~(x7y)~xyp7] 


is invertible; and 
(d) [p(A)~'A]” is uniformly bounded as m— oo: 


then 


l N 
lim — A -IA Mafya lT 
N>% N „>, loa) AI” = (xy) “xy 


Moreover, there exists a finite positive constant C = C(A) such that 


1 N C 
W 2A AIO tay] se 


oO d 


for all N=1,2,.... 


8.7 Stochastic and doubly stochastic matrices 527 


526 Nonnegative matrices 
| The set of stochastic matrices in M, is a compact convex set with a 
Problems | simple but important property. If we denote by e e R” the vector with all 
M, and if J—B is invertible, show that j components +1, a nonnegative matrix A € M, is stochastic if and only if 
1. If Be M, andi | Ae=e. Thus, the stochastic matrices in M, form an easily recognized 
5 B” = B(I—B)(I—B)~ | family of nonnegative matrices with a particular positive eigenvector in 
mt | common. Nonnegative matrices with a positive eigenvector have many 


int: i — B. special properties [e.g., (8.1.30), (8.1.31), and (8.1.33)] which therefore 
Hints Multiply by 7 are possessed by all stochastic matrices. 


2. Prove Theorem (8.6.2). l A stochastic matrix A e M,, with the property that A’ is also stochastic 

the rate of convergence in Theorems (8.5.1) and (8.6.1). | is said to be doubly stochastic; all row and column sums are +1. The set 
3. Compare how that the rate of convergence in (8.6.1) cannot be f of doubly stochastic matrices is also a compact convex set in M,, and a 
mprovede a “ | nonnegative matrix A € M; is evidently doubly stochastic if and only if 
im . 


; ; ‘ble. and write A4”= | both Ae=e and eTA =e", One type of doubly stochastic matrix has 

4. Suppose AEM, is sor heres ee. 1) to show for each given pair { already been encountered in (6.3.5), namely an orthostochastic matrix 
(n = .... Use Theorem (8.6. | =Tly.. ty. is unitary, 

laij ] for m= hs 2s infinitely many values of m. This result may be | A=[|u;;|], where U [u;;]eM, is unitary. That the row and column 

(i, j) that aij > 0 for infinitely y sums of A are all +1 follows from the fact that the rows and columns of 

U are all Euclidean unit vectors. 


a . to 

thought of as a generalization of Theorem (8.5 2: ore an ea h 
show that there may also be infinitely many values of m for w ij . Another example of doubly stochastic matrices is the set croup) of 
permutation matrices. The permutation matrices are really the funda- 


(in) 
mental and prototypical doubly stochastic matrices, for Birkhoff’s 


5. Under the hypotheses of Theorem (8.6.2), show that aj >0 for 

infinitely many values of m provided the pair (i, j) is such that x; y; #0. i N ‘oul heen m kho! 

Why does this result include Problem 4? t eorem says that any oub y stoc astic matrix is a convex combination 
, , 8.6.1) when Ais of finitely many permutation matrices. The proof we present for Birkhoff ’s 

6. Show directly that Theorem (8.5.1) implies Theorem (8.6. theorem relies on the fact (see Appendix B) that every point in a compact 

convex set S is a convex combination of the extreme points of S. We shall 


primitive. Hint: What is required here is the proof of the following result 
? . . . * , t n l 15 * . . 

from analysis: If a sequence is convergent to a finite limit, the show that the extreme points of the set of doubly stochastic matrices are 

precisely the permutation matrices. 


Cesaro-summable to the same limit. 


7. Consider A = i a] , explicitly compute 


8.7.1 Theorem (Birkhoff). A matrix A €M, is a doubly stochastic 
_ matrix if and only if for some N< œ there are permutation matrices 
P\,..., Py€ M, and positive scalars 1, ++, ONER such that a+ ++» +ay= 
land A =P- +anPy. 


N 
lim N! $ [p(A)7!4]” 


N = œ mei 


compute the value of this limit given by Theorem (8.6.1), and compare. 


Proof: The sufficiency of the condition is clear; we must establish its 
necessity. Let A =[a, j] € Mn bea given doubly stochastic matrix. If Ais a 
permutation matrix, then there is precisely one entry +1 in each row and 
column and all other entries are 0. If we could write A = a;B+a)C with 
0<ay,a2<1, a+ =l, and B, C doubly stochastic, then every entry of 
B and C that corresponds to a 0 entry a;,=0 of A must satisfy 0= 
Qj; = a bj; +02¢;;, sO bj; =¢j; =0 since œ and a are both nonzero and 
bij, Ci; are nonnegative. Since B and C are doubly stochastic, their row 
sums are +1 and hence nonzero entries must all be +1 and in the same 
positions as the nonzero entries of A; that is, A = B =C. This shows that 


8.7 Stochastic and doubly stochastic matrices 


A nonnegative matrix A €e M,, with the property that all its row sums a 
+1 is said to be a (row) stochastic matrix because each row may | 
thought of as a discrete probability distribution on a sample space vi ; 
points. A column stochastic matrix is the transpose ofa row stoc as 

matrix; such matrices arose naturally in the intercity population migra 
tion model discussed in Section (8.0). Stochastic matrices also arise in t 

study of Markov chains and in a variety of modeling problems in suc 
fields as economics and operations research. 


528 Nonnegative matrices 


every permutation matrix is an extreme point of the set of doubly sto- 


chastic matrices. 
On the other hand, if A is not a permutation matrix there is at least 


one row of A, say row i, that contains at least two nonzero entries. In 


8.7 Stochastic and doubly stochastic matrices 529 


AX = 
a oe ty ney and that ADe = pDe, where e€ R” has all entries 
. at A ts similar (via a diagonal similarit ix wi i 
i clud i trix with - 
tive main diagonal entries) to iti i ‘ame Of 

a positive multiple [namely p(A)] of 

* * EJ a 
stochastic matrix. This observation permits many questions AN non- 


negati , , ve . 
gatve matrices with a positive eigenvector to be reduced to questions 


that row, choose any nonzero entry iip, which must satisfy 0 <4aj,;,<1 
about stochastic matrices. 


since there are at least two nonzero entries in row /, and the sum of all the 
(nonnegative) entries in the row is +1. Since 0 <a,,;,<1 and the sum of 
all the (nonnegative) entries in column iz is +1, there must be some other 
nonzero entry @j3i, i; # i), in the same column as aii, and 0 < aizi <1. 
By the same reasoning, there is some other nonzero entry dj; i,, i4 # iz, in 
the same row as aiziz, and 0 < aizi, < 1. If this process is continued and the 
successive entries chosen in this way are marked, after finitely many steps 
there will be a first time that an entry a;; is chosen that has previously 
been chosen. The sequence of entries from the first up to the second 
occurrence of the entry a;; (including the first but not the second occur- 
rence of a;;) is a finite ordered sequence of entries of A, each successive 
pair of which is alternately in the same row or column; let apj be the 
smallest (positive) entry in this sequence. Let B e M, be a matrix in which 
+1 occurs in the same position as the first entry a;; of the sequence, —1 
occurs in the same position as the second entry of the sequence, +1 in the 
same position as the third entry of the sequence, and so on alternately 
choosing +1. All the other entries of B are 0. Notice that all the row and 
column sums of B are 0. Let A, =A+a;;B and A_ = A—a;;B. Notice 
that both A, and A_ are nonnegative matrices (because of the minimality 
property of ap) whose row and column sums are +1 (because the row 
and column sums of B are 0), so A, and A_ are doubly stochastic. Since 
A=4A,+4A_ and A, # A, we conclude that A is not an extreme point 
of the set of doubly stochastic matrices. 

The argument just presented shows that a given matrix is an extreme 
point of the compact convex set of doubly stochastic matrices if and only 
if it is a permutation matrix. The theorem follows from the fact that 
every point in a compact convex set is a convex combination of extreme 


2. Show that the sets of i 
stochastic and doubly stoc i i i 
are compact convex sets. ” hastie matrices in Mn 


W prow mat the sets of stochastic and doubly stochastic matrices in M 
tute a semigroup under matrix multiplicati is, if 
iplication; that is, if 

A, Be M,, are (doubly) stochastic, then AB is (doubly) stochastic 


8 M, 


5. Show that a 2-by-2 doubl ; . 
: stochast ; ee 
diagonal entries. y te matrix is symmetric with equal 


6. . . 

Shen the meas employed in the proof of (8.7.1) to (a) give an alternate 

nee proo that does not use results of Appendix B and (b) give an algo. 

Or decomposing a doubly stochasti i l 

c matrix as a convex combina- 

ton of permutations. Hint: If A j queno 

l . . Is not a permutation, use th 

of entries indicated in the “a punitive 
; proof to produce a permutati iti 

multiple of which may be a honterative 
subtracted from A to leave i 

! a nonnegative 

matrix wie equal row and column sums and at least one fewer nonzero 

ries. INOW, repeat the argument on this matrix and continue. 


7. Show that the decomposition in (8.7.1) is not unique. 


8. If a doubly stochastic matrix A is reducible, show that A is actually 


permutation-similar to a matrix of A, 0 
the fo . 
about A; and A3? m | 0 A |. What may be said 


Samer Reading. Toe idea of the proof of (8.7.1) is contained in B 

an - Schneider, “Applications of the G l 

: ; ider, ordon-St 

onan H Combinatorial Matrix Theory,” SIAM Rey. 21 (1979) 
~241, where related facts may be found. For a dj ion 

. a discussion of th 
that every doubly stochastic matrix A e Mẹ, is a convex combination of at 
most n°—2n+2 permutation matrices, see M. Marcus and R Ree 


“Diagonals of Doubly Stochastic Matri 
t ” 
2, 10 (1959), 295-302. atrices,” Quart. J. Math Oxford, Ser. 


points. O 


Since there are exactly n! distinct permutation matrices in M,,, Birk 
hoff’s theorem ensures that any doubly stochastic matrix can be ex 
pressed as a convex combination of at most N=n! permutation ma 
trices. A more refined analysis shows that not more than N= n?>—2n+ 


terms are needed. 


Problems 


1. Let Ae M, be a nonnegative nonzero matrix with a positive eigen 
vector x =[x,], and let D = diag(x;, ...,.,,). Show that p = p(A)>0, tha 


APPENDIX A 


Complex numbers 


A complex number has the form 
Z=atib 


in which a and b are real numbers and / is a formal symbol satisfying the 
relation ¿°= —1. The real number a is called the real part of z and is 
denoted Rez; the real number b is called the imaginary part of z and is 
denoted Im z. The complex conjugate Z of the complex number z=a+ib 
is Z=a—Iib. If z}=a,+ib; and z, =a, +ib, are complex numbers, then 
the binary operations of addition and multiplication are defined in the 
following natural ways in terms of the corresponding operations for real 
numbers: 


Zi +z = (a; +a@2)+i(b,+ b2), 2122 = AQ. — bby +i (ay b+ a7b;) 


Thus, addition is the result of adding real parts and adding imaginary 
parts, and multiplication is the result of algebraic expansion together 
with the relation i*=—1. The additive inverse of z=atib is —z= 


—a+i(—b), and, as long as z = 0 =0 +10, the multiplicative inverse of z is 


i a-ib a +i —b 

z a+b? ath? (atb 
j Subtraction and division of complex numbers zı and z are defined by 
1 


Z] 
217-22 =Z1+(~—22), — = 24, -— 
22 22 


_ Z122 
Z223 


: The set of all complex numbers is denoted by C; the operations of addi- 
F tion and multiplication are commutative, and C constitutes a field under 


531 


na 


these operations, with the real number 0 =0+i0 as additive identity and 
the real number 1 =1+/0 as multiplicative identity. The real numbers R 
form a subfield of C; the absolute value (or modulus) of z, denoted Izl, 
is defined by |z| = +(zZ)!⁄2, which is always a nonnegative real number. 
The quotient z; /z3 is then (1/ \Z2|?)z Zo, if %2 #0. The operations of mul- 
tiplication and complex conjugation are easily verified to commute, Z,Z> = 
2:22, and the complex conjugate of the complex conjugate is the original 
complex number again. Since Re z = (1/2)(Z+2Z) and Im z = (1/2/)(z—2), 
the real numbers are just those z e C such that Im z=0, or equivalently, 
Z=Z(=Rez). 

Geometrically, the complex numbers C may be thought of as a plane 
with origin at 0 and a “real axis” and “imaginary axis.” Thus, z=a+ib 
may be identified with the ordered pair (a, b). The real axis {z:Imz=0} 
is just the usual real line, and the imaginary axis {z: Re z=0} is just i 
times the real line or all “pure imaginary” numbers. The projection of 
z € C onto the real axis (imaginary axis) is Re z (i Im z). Complex conju- 
gation is reflection across the real axis, and |z| is the Euclidean distance 
of z from the origin in the complex plane. The open (closed) right half- 
plane of C is {zeC:Rez>(=)0}, and the open (closed) upper half- 
plane of Cis {z e C: Imz > (>)0}. The unit disc of C is (zeC:|z|<}}, 
and the disc about a eC of radius r is {z eC: lz—alsr}. 

The last paragraph described the complex plane C in terms of rec- 
tangular coordinates. The complex plane may also be described usefully 
in terms of polar coordinates, in which the position of z eC in the plane 
is described in terms of the radius r of the circle about the origin on which 


z lies and the angle 8, measured counterclockwise around from the real . 


line, of a directed ray from the origin on which z lies. The polar coordi- 
nates of a are then (r, 0), and the notation z = re” is used, in which e” = 
cos 0+isin0. The angle 0 is the argument of Zz, written 6 =arg z; since 
e" = e'(*2") arg z is only determined mod 27r. If z =a + ibin rectangular 
coordinates and z =re* in polar coordinates, the transformation from 


polar to rectangular coordinates is 
a=rcos 6, b=rsin@ 


and from rectangular to polar coordinates when r #0 is 


r=(|z|=(a7+b2)!/2, @=aresin Ž = arg z 

in which we generally take 0 < 6 < 2x. Circular objects are often eas- 
ily described in polar coordinates. The unit disc in C, for example, is 
fre":0<r<1,0<0<2rz}. 


APPENDIX B 


Convex sets and functions 


Let V be a vector space over a field that contains the real numbers. A con- 
vex po mbination ofa selection U1» +++ Vk EV of elements of V is a linear 
combination whose coefficients are nonnegative and sum to I: 


AVIE HAU Qpa. 


A subset K of V is said to be convex if any convex combination of an 
selection of elements from K lies in K., Equivalently, K is convex if all 
convex combinations of pairs of points in K are again in K. Geometri- 
cally, this may be interpreted as Saying that the line segment joinin an 
two points of K must lie in K;; that is, K has no “dents” or “holes.” A con. 
vex set K for which ax e K whenever a>0and xeK is called a conve - 
cone (equivalently, positive linear combinations from K are in K). It 4 
sraightforward to verify that both the set sum and the intersection of two 
con vex sots S espectively, convex cones) is again a convex set (respectively, 
Now let V be a real or complex vector space with a given norm, so one 
can speak of open, closed, and compact sets in V. An extreme point ofa 
closed convex set K is a point ze K that may be written as a convex co 
bination of points from K in only a trivial way; that is, z = œx + g- ay, 
0<a<l, x, yeK , implies x = Y =Z. A closed convex set may have a finite 
number of extreme points (e.g.,a polyhedron), infinitely many extreme 
points (e.g., a closed disc), or no extreme points (e.g., the closed u 
half-plane in R?). A compact convex set always has extr row. 


533 


534 Appendix B 


containing S. The Krein-Milman theorem says that a compact convex set 
is the closure of the convex hull of its extreme points. A compact convex 
set is said to be finitely generated if it has finitely many extreme points, 
the extreme points being called generators of the convex set. 

Now suppose V is a real inner product space with inner product £e, .). 
The separating hyperplane theorem states that if Kı, K2 SV are two given 
nonempty nonintersecting convex sets with K; closed and K; compact 
then there exists a hyperplane H in V such that K; lies in one of the close 
half-spaces determined by H while K> lies in the other; that is, n sep- 
arates K, and K3. A hyperplane H in V is just a translation of the 
thogonal complement of a one-dimensional subspace of V: H sins ‘ 
(x— p,q) = 0} for given vectors p,q € V, g#0. The hyperplane e e 
mines two open half-spaces: H*={xeV: <x —P, q)> 0}, H= (xe f 
{x—p,q) <0}. The sets Ho =H* UH and Ho = H~ UH are the closed 
half-spaces determined by H. Thus, separation means that K 1S Ho k 
K, S Hg for some vectors p, q. There are various strengthenings of me 
separation conclusion depending upon additional assumptions abou e 
two convex sets. For example, if the closures of K; and K, do not intersect, 
then the separation may be taken to be strict; that is, Kj GH", KS H ; 
The closure of the convex hull of any bounded set sc pean be obtaine 

i ection of all closed half-spaces that contain >. 
a tn the event that V is the vector space C” with complex inner Pro 
(e, +), hyperplanes and half-spaces are defined similarly, except that 
must be identified with R?” and <e, «) must be replaced with the real in- 


. . . 2 
ner product Re<s, *) as follows. Identify x +iy e C” with [>] eR”, and 


note that Re(x,+i¥1,X2 +iy2) = (X1, X2) + {¥1,¥2) by conjugate linearity 
of the complex inner product. Then (x1, X2} +412) is the (real) inner 


product of [>] and [z] , and hyperplanes and hal spaces defined in R?” 
ropriate geometric interpretation . l 

nawe veal ao ped function f defined on a convex set K S V is said to be con- 
vex if 

flax+(l—a)y) saf(x)+(l-a) SY) | (*) 
for all O<a<land all x, yeK, y#x. If the above inequality is always 
strict, then f is called strictly convex. If the above inequality is reverse 
for allO<a<landallx,yeK, y#x, then f is called concave (or strictly 
concave if it is reversed and always strict). Equivalently, a concave (re- 
spectively strictly concave) function is just the negative of a convex (respec. 
tively strictly convex) function. Geometrically, the chord joining any wo 
function values f(x) and f(y) lies above (respectively below) the grap 
of a convex (respectively concave) function. A linear function Is both 
convex and concave. In the case V = R”, and K an open set, the Hessian 


Convex sets and functions 535 


2 
H(x)=| os w| 


Ox; OX; 


which is a symmetric matrix in M,,(R), exists almost everywhere in K for 
a convex function f and is necessarily positive semidefinite for points in 
K at which it exists. It is positive definite in the strictly convex case. Con- 
versely, a function whose Hessian is positive semidefinite (respectively 
positive definite) throughout a convex set is convex (respectively strictly 
convex). Similarly, negative definiteness corresponds to concavity. 
Optimization of convex and concave functions has some pleasant 
properties. On a compact convex set the maximum (respectively, mini- 
mum) of a convex (respectively, concave) function is attained at an ex- 
treme point. On the other hand, on a convex set, the set of points at 
which the minimum (respectively maximum) of a convex (respectively 
concave) function is attained is convex and any local minimum (respec- 
tively maximum) is a global minimum (respectively maximum). For 
example, a strictly convex function attains a minimum at at most one 
point of a convex set, and a critical point is necessarily a minimum. 
Convex combinations of real numbers obey some simple but fre- 
quently useful inequalities. If x,,...,x, are given real numbers, then 


k 
min xs È œX; S max x; 
lsisk i=l Isisk 
for any convex combination ay, a2,...,a,20 and aj+---+a,=1. 
Consideration of certain simple convex functions /(+) of one variable 
on an interval leads to various classical inequalities. One can use induc- 
tion to show that the defining two-point inequality (*) on the interval 
implies an n-point inequality 


n n 

i È axi) = X aif(xi), n=2,3,... (**) 

i=] i=l 

whenever a; 20, œ+ -+ +a, = 1, and all x; are in the interval. 
Application of (++) to the strictly convex function f(x) = —log x over 


the interval (0,0) leads to the weighted arithmetic-geometric mean 
inequality 


a n 
S aixz T] x", xz0 
i=l i=l 
which contains the arithmetic-geometric mean inequality 
n n l/n 
S x= (IIx) , Xx;z0 


1 
A i=] i=l 


when all a; =1/n. Equality holds if and only if all x; are equal. 


536 Appendix B 


Application of (#*) with f(x) =x”, p >1, over the interval (0, œ) leads 
to Hölder’s inequality 
n n I/P; n 1/q 
X xis ( X x?) ( X vi) 
i=l i=l i=l 
where x;, y;>0, p>1, and 1/p+1/q =1. Equality holds if and only if the 


vectors [xf] and [ y;'] are dependent. If we take p=q =2, we obtain a 
version of the Cauchy-Schwarz inequality 


X xiv 5x?) ( X?) 
i=] i= i= 


Equality holds if and only if the vectors [x;] and [y;] are dependent. 
From Hölder’s inequality one can deduce Minkowski’s inequality 


n \/p n \/p n \/p 
| 5 cto? | <( 5 x?) +( £) 
1 i=] j= 


js 


where x;, y; > 0 and p2=1. Equality holds if and only if the vectors [x] 
and [y;] are dependent. 


Further Readings. For more information about convex sets and geometry 
see [Val]. For more about convex functions and inequalities see [Boa] 
and [BB]. 


APPENDIX C 


The fundamental theorem of algebra 


One historical motivation for introducing the complex numbers C was 
that polynomials with real coefficients may have nonreal complex ze- 
roes. For example, the quadratic formula reveals that the equation 
x*~2x+2=0 has roots (solutions) {1+/,1—/}. All zeroes of any poly- 
nomial with real coefficients, however, are contained in C. In fact, if the 
field of possible coefficients is extended to C, all zeroes of all polynomials 
with complex coefficients are still contained in C. Thus, C is an example 
of an algebraically closed field: that is, there is no field F such that Cis a 
subfield of F, and such that there is a polynomial with coefficients from C 
and with a zero in F that is not in C. 

The fundamental theorem of algebra states that any polynomial p(x), 
of degree at least 1, with complex coefficients has at least one zero z [i.e., 
z is a root of the equation p(x) = 0] among the complex numbers. Using 
synthetic division, if z is zero of p(x), then x—z divides p(x); that is, 
P(X) =(x—z)q(x), in which q(x) is a polynomial with complex coeff- 
cients, whose degree is 1 smaller than that of P(x). The zeroes of p(x) 
are then those of g(x), together with z. The following is a consequence of 
the fundamental theorem of algebra. 


Theorem. A polynomial of degree n=1 with complex coefficients has, 
counting multiplicities, exactly n zeroes among the complex numbers. 


The multiplicity of a root z of p(x) =0 is the largest integer k for which 
(x —z)* divides p(x), that is, the “number” of times z occurs as a root of 
p(x) =0. If a root z has multiplicity 3, then it is counted 3 times toward 
the number n of roots of p(x) =0. It follows that a polynomial with 


537 


538 Appendix C | 
complex coefficients may always be factored into a product of linear 


factors over the complex numbers. . 
If a polynomial p(x) with real coefficients has some nonreal complex 


zeroes, however, they must occur in conjugate pairs, since, if 0 = p(z), 
then 0=0= p(z)=p(Z). It follows, since 


(x—z)(x—Z) =x?—2Re(z)x + |z|? 


that any real polynomial may be factored into a product of powers of 
linear and quadratic factors over the reals, each irreducible quadra 
factor corresponding to a conjugate pair of complex roots. 


Further Readings. For an elementary proof of the fundamental theorem 
of algebra, see [Chi]. 


APPENDIX D 


Continuous dependence of the zeroes of a 
polynomial on its coefficients 


It is an important fact, most readily proved using complex analysis, 
that the n zeroes of a polynomial of degree n=1 with complex coeffi- 
cients depend continuously upon the coefficients. 

For xe C”, let f(x) =A), ..., fax), in which f:C”>C, i= 
1,...,m. The function f: C” C” is continuous at x if each Jj is con- 
tinuous at x, i=1,...,m. The function fi: C” +C is continuous at x if, 
for each e> 0 there is a 6>0 such that if |y—x] < ô, then AOF < 
e, where |e] is a vector norm on C”. 

The continuous dependence result could be stated intuitively by saying 
that the function f:C” =C”, which takes the n coefficients (all but the 
leading 1) of a monic polynomial of degree n to the n zeroes of the poly- 
nomial, is continuous. There is a problem, however; there is no simple 
way to define this function, since there is no natural way to define an 
ordering among the 7 zeroes. As a precise statement of the continuous 
dependence of the zeroes on the coefficients of a polynomial, we offer the 
following. 


Theorem. Let n> 1 and let 
P(X) = ay X" 4+, -\x""'4+--+a,x+a9, a, #0 


be a polynomial with complex coefficients. Then, for every e >0, there is 
a 6>0 such that for any polynomial 


G(X) = Dy x" + Dy px" + +O, x4 dy 
satisfying 6, #0 and 


539 


540 Appendix D 


max |a;—b;|<6 


O<isn 
we have 
min max [Aj Bray] <E 
T isjsn 
where )\j,...,A, are the zeroes of p(x) and pi, ..., 4n are the zeroes of 
q(x) in some order, counting multiplicities, and the minimum is taken 
over all permutations 7 of 1,2,...,7. 


Thus, sufficiently small changes in the coefficients of a polynomial can 
lead only to small changes in any zero. This principle is of fundamental 
importance in matrix analysis because the coefficients of the character- 
istic polynomial p,4(t) of a matrix A € M, are continuous (in fact, poly- 
nomial) functions of the entries of A (1.2.11) and the zeroes of palt) are 
the eigenvalues of A. Since the composition of continuous functions is 
continuous, sufficiently small changes in the entries of A will cause only 
small changes in the coefficients of p,4(t), which result in small changes in 
the eigenvalues. Thus, the eigenvalues of a square real or complex matrix 
depend continuously upon its entries. 


Further Readings. For explicit bounds on the deviation e between the 
zeroes of p(x) and q(x) in terms of the coefficient separation 6 and the 
sizes of the coefficients, see L. Elsner, “On the Variation of the Spectra of 
Matrices,” Linear Algebra Appl. 47 (1982), 127-138. 


APPENDIX E 


Weierstrass’s theorem 


Let V be a finite-dimensional real or complex vector space with norm jel. 
The ball of radius e about xe V is B.(x) = {yeV: |y—x| <e}. A subset 
SCV is said to be open if for every xes there is an €>0 such that 
B(x) SS. A subset TEV is said to be closed if the complement of Tin V 
is open. A subset SCV is called bounded if there is an r>0 such that 
SCB,(0). Equivalently, T is closed if and only if the limit of any con- 
vergent (with respect to |+|) sequence from T is in T, and S is bounded if 
S is contained in any ball of finite radius. A subset S ¢ V is compact if it 
is both closed and bounded. 

For SCV, a function f: § + R may or may not attain a (global) max- 
imum or minimum value on S. However, under certain frequently occur- 
ring circumstances, we may be sure that J attains a maximum on S. 


Theorem (Weierstrass). Let S be a compact subset of a finite-dimensional 
real or complex vector space V. If f: S > Ris a continuous function, then 
there exists a point Xmin € S such that 


S(Xmin) S f(x) forall xes 
and there exists a point Xmax E€ S such that 
F(x) Sf(Xmax) forall xes 


That is, f attains its minimum and maximum on S. Of course, the values 
MaXyes f(x) and minyes f(x) may each be attained at more than one 
point of S. If either of the key assumptions (compact S and continuous 
J) of Weierstrass’s theorem do not hold, the conclusion may fail. The 


541 


542 Appendix E 

of a finite-dimensional real or complex 
however. With a suitable definition of 
inuous real-valued func- 


assumption that S is a subset 
vector space is not essential, 
compact, Weierstrass’s theorem holds for a cont 
tion on a compact subset of a general topological space. 


Ait 
Bar 75 
Bar 79 
Bar 83 
BB 
Bel 


Boa 


BPI 
BSt 


CaLe 


Chi 
Cul 


Don 


References 


A. C. Aitken. Determinants and Matrices. 9th ed. Oliver and Boyd, 
Edinburgh, 1956. 

S. Barnett. Introduction to Mathematical Control Theory. Clarendon 
Press, Oxford, 1975. 

S. Barnett. Matrix Methods for Engineers and Scientists. McGraw- 
Hill, London, 1979. 

S. Barnett. Polynomials and Linear Control Systems. Dekker, New 
York, 1983. 

E. F. Beckenbach and R. Bellman. Inequalities. Springer-Verlag, New 
York, 1965 

R. Bellman. Introduction to Matrix Analysis. 2d ed. McGraw-Hill, 
New York, 1970. 

R. P. Boas, Jr. A Primer of Real Functions. 2d ed. Carus 
Mathematical Monographs, No. 13. Mathematical Association of 
America, Washington, D.C., 1972. 

A. Berman and R. Plemmons. Nonnegative Matrices in the 
Mathematical Sciences. Academic Press, New York, 1979. 

S. Barnett and C. Storey. Matrix Methods in Stability Theory. Barnes 
& Noble, New York, 1970. 

J. A. Carpenter and R. A. Lewis. KWIC Index for Numerical 
Algebra. U.S. Dept. of Commerce, Springfield, Va. Microfiche and 
printed versions available from National Technical Information 
Service, U.S. Dept. of Commerce, 5285 Port Royal Road, Springfield, 
VA 22161. 

L. Childs. A Concrete Introduction to Higher Algebra. Springer- 
Verlag, Berlin, 1979. 

C. G. Cullen. Matrices and Linear Transformations. Addison-Wesley, 
Reading, Mass., 1966. 

W. F. Donoghue, Jr. Monotone Matrix Functions and Analytic 
Continuation. Springer-Verlag, Berlin, 1974. 


543 


544 References 


Fad V. N. Faddeeva. Trans. C. D. Benster. Computational Methods of 
Linear Algebra. Dover, New York, 1959. l 
Ky Fan. Convex Sets and Their Applications. Lecture Notes, Applied 
Mathematics Division, Argonne National Laboratory, Summer 1959. 
Fie M. Fieldler. Spectral Properties of Some Classes of Matrices. Lecture 
Notes, Report No. 75.01R. Chalmers University of Technology an 


the University of Göteborg, 1975. l 
J. Franklin. Matrix Theory. Prentice-Hall, Englewood Cliffs, N.J., 


Fan 


Fra 
1968. l 

Gan F. R. Gantmacher. The Theory of Matrices. 2 vols. Chelsea, New 
York, 1959. 


Gant F.R. Gantmacher. Applications of the Theory of Matrices. 
Interscience, New York, 1959. o l 

GKr F. R. Gantmacher and M. G. Krein. Oszillationsmatrizen, 
Oszillationskerne, und kleine Schwingungen mechanische Systeme. 
Akademie-Verlag, Berlin, 1960. l l 

GLR82 1. Gohberg, P. Lancaster, and L. Rodman. Matrix Polynomials. 
Academic Press, New York, 1982. l l 

GLR83 I. Gohberg, P. Lancaster, and L. Rodman. Matrices and Indefinite 
Scalar Products. Birkhäuser-Verlag, Boston, 1983. A 

Grah A. Graham. Kronecker Products and Matrix Calculus wit. 
Applications. Horwood, Chichester, U.K., 1981. i 2d ed 

Gray F. A. Graybill. Matrices with Applications to Statistics. 2d ed. 
Wadsworth, Belmont, Calif., 1983. 

Gre W. H. Greub. Multilinear Algebra. 2d ed. Springer-Verlag, New 
York, 1978. 

GVI G. Golub and C. VanLoan. Matrix Computations. J ohns Hopkins 
University Press, Baltimore, 1983. 

Hal58 P.R. Halmos. Finite-Dimensional Vector Spaces. Van Nostrand, 


Princeton, N.J., 1958. 
Hal67 P. R. Halmos. A Hilbert Space Problem Book. Van Nostrand, 


Princeton, N.J., 1967. , l , 
HJ R. Horn and C. Johnson. Topics in Matrix Analysis. Cambridge 
University Press, Cambridge, 1989. l 
HKu K. Hoffman and R. Kunze. Linear Algebra. 2d ed. Prentice-Hall, 
Englewood Cliffs, N.J., 1971. Ea l l 
Hou 64 AS. Householder. The Theory of Matrices in Numerical Analysis. 
Blaisdell, New York, 1964. 
Hou72 A. S. Householder. Lectures on Numerical Algebra. Mathematical 
Association of America, Buffalo, N.Y., 1972. l oal 
HSm M. W. Hirsch and S. Smale. Differential Equations, Dynamica 
Systems, and Linear Algebra. Academic Press, New York, 1974. 
Jac 
New York, 1943. 


Kap 
& Bacon, Boston, 1969. o 
Kar S. Karlin. Total Positivity. Stanford University Press, Stanford, 


Calif., 1960. 


N. Jacobson. The Theory of Rings. American Mathematical Society, 


I. Kaplansky. Linear Algebra and Geometry: A Second Course. Allyn 


Kel 
Kow 
LaH 


Lan 
LaTi 


Mac 
Mar 


Mir 
MMi 
MOI 


Mui 


Ner 


New 
Nob 


Per 
Rog 


Rud 


Sen 
Ste 


Str 
STy 
Tod 
TuA 
Tur 


Val 
Var 


References 545 


R. B. Kellogg. Topics in Matrix Theory. Lecture Notes, Report 
No. 71.04, Chalmers Institute of T. echnology and the University of 
Goteberg, 1971. 
H. Kowalsky. Lineare Algebra. 4th ed. deGruyter, Berlin, 1969. 
C. Lawson and R. Hanson. Solving Least Squares Problems. 
Prentice-Hall, Englewood Cliffs, N.J., 1974. 
P. Lancaster. Theory of Matrices. Academic Press, New York, 1969. 
P. Lancaster and M. Tismenetsky. The Theory of Matrices With 
Applications. 2d ed. Academic Press, New York, 1985. 
C. C. MacDuffee. The Theory of Matrices. Chelsea, New York, 1946. 
M. Marcus. Finite Dimensional Multilinear Algebra. 2 vols. Dekker, 
New York, 1973-75, 
L. Mirsky. An Introduction to Linear Algebra. Clarendon Press, 
Oxford, 1963. 
M. Marcus and H. Minc. A Survey of Matrix Theory and Matrix 
Inequalities. Allyn & Bacon, Boston, 1964. 
A. W. Marshall and I. Olkin. Inequalities: T. heory of Majorization 
and Its Applications. Academic Press, New York, 1979. 
T. Muir. The Theory of Determinants in the Historical Order of 
Development. 4 vols. MacMillan, London, 1906, 1911, 1920, 1923; 
Dover, New York, 1966. Contributions to the History of 
Determinants, 1900-1920. Blackie, London, 1930. 
E. Nering. Linear Algebra and Matrix Theory. 2d ed. Wiley, New 
York, 1963. 
M. Newman. Integral Matrices. Academic Press, New York, 1972. 
B. Noble. Applied Linear Algebra. Prentice-Hall, Englewood Cliffs, 
N.J., 1969. 
S. Perlis. Theory of Matrices. Addison-Wesley, Reading, Mass., 1952. 
G. 5. Rogers. Matrix Derivatives. Lecture Notes in Statistics, Vol. 2. 
Dekker, New York, 1980. 
W. Rudin. Principles of Mathematical Analysis. 3rd ed. McGraw- 
Hill, New York, 1976. 
E. Seneta. Nonnegative Matrices. Wiley, New York, 1973. 
G. W. Stewart. Introduction to Matrix Computations. Academic 
Press, New York, 1973. 
G. Strang. Linear Algebra and Its Applications. Academic Press, New 
York, 1976. 
D. A. Suprenenko and R. I. Tyshkevich. Commutative Matrices. 
Academic Press, New York, 1968. 
J. Todd (ed.). Survey of Numerical Analysis. McGraw-Hill, New 
York, 1962. 
H. W. Turnbull and A. C. Aitken. An Introduction to the Theory of 
Canonical Matrices. Blackie, London, 1932. 
H. W. Turnbull. The Theory of Determinants, Matrices and 
Invariants. Blackie, London, 1950. 
F. A. Valentine. Convex Sets. McGraw-Hill, New York, 1964. 
R. S. Varga. Matrix Iterative Analysis. Prentice-Hall, Englewood 
Cliffs, N.J., 1962. 


546 
Wed 


Wil 


References 


J. H. M. Wedderburn. Lectures on Matrices. American Mathematical 
Society Colloquium Publications XVII. American Mathematical 
Society, New York, 1934. 

H. Wielandt. Topics in the Analytic Theory of Matrices. Lecture 
Notes prepared by R. Meyer. Department of Mathematics, University 
of Wisconsin, Madison, 1967. 

J. H. Wilkinson. The Algebraic Eigenvalue Problem. Clarendon 


Press, Oxford, 1965. 


Mm, 7(F) 


m,n 


M 


X, Y, Z, etc. 


H 
A, B,C, etc. 


Notation 


the real numbers 

real vector space of real n-vectors, Mn (R) 

the complex numbers 

complex vector space of complex n-vectors, Mn (C) 

a field (usually R or C) 

vector space (over F) of n-vectors with entries from F, 
Man, (F) 

m-by-n matrices with entries from F 

m-by-n complex matrices, Mm, n(C) 

n-by-n complex matrices, M, „(C) 

matrices; A = [a;;] E€ Mm, n (F) 

column vectors; x = [x;] e F” 

identity matrix in M,(F) 

zero scalar, vector, or matrix 

matrix of complex conjugates of entries of A EMm,n(C) 

transpose of Ae Mm,n(E) 

Hermitian adjoint of A € Mm, „a (C), AT 

inverse of a nonsingular A e M, (F) 

unique positive semidefinite square root of a positive 
semidefinite A e M, 

matrix of absolute values of entries of Ae My,» 

Moore-Penrose generalized inverse of AE My,» 

classical adjoint (adjugate) of A e M,(F) 

a basis of a vector space 

ith standard basis vector in F” (usually) 

® coordinate representation of a vector v 

@,-@, basis representation of a linear transformation T 


548 Notation 


binomial coefficient, n!/[k!(n—k)!] 

characteristic polynomial of A € M,(F) l 

conaition number (for inversion, with respect to a given 
matrix norm) of a nonsingular A € M, 

determinant of A € M,(F) 

direct sum 

directed graph of Ae M,(F) 

dual norm of a vector norm |s| 

dual norm of a pre-norm O 

igenvalue of A €e M,, (usually l 7 

set of eigenvalues (spectrum) of AeM,, if Ais Hermitian, 
one usually takes M SM S++ SAn 

factorial, n(n—1)(n—2)...2-1 

Gerggorin region of A € M, l 

group of nonsingular matrices in M,,(F) 

Hadamard product of A, Be Mm, n(F) 

index of primitivity of a primitive AéEM,, 

indicator matrix of A€ My, n(F) 

Jordan block of size k with eigenvalue 

Kronecker (tensor) product 

minimal polynomial of Ae M,(F) 

/, (sum) norm of C”; /; matrix norm on M, 

h (Euclidean) norm on C”; L (Frobenius) matrix norm 
on Ma 

lo (Max) norm on C"; l vector norm on Ma 

l, norm on C” . 

maximum column sum matrix norm on M, 

spectral matrix norm on M, 

maximum row sum matrix norm on M, 

numerical radius of A € M, (usually) 

orthogonal complement 

permanent of Ae M,(F) 

rank of AEM», ,(F) l 

signum of a permutation 

set of singular values of A € Mm, 13 one usually takes 
0, 2 02 2 t+ Z Omini, n) 20 

largest singular value of AE Mm,n ||All2 

span of a subset S of a vector space 

spectral radius of Ae M, 

spectrum (set of eigenvalues) of Ae Mn l 

submatrix of Ae Mm,a (Œ) determined by the index sets a, 8 

trace of Ae M,(F) 


Index 


a priori bounds, 337 
absolute 
convergence, 279, 300 
value of a complex number, 532 
vector norm, 285, 310, 365, 438 
additive property, of inner product, 
260 


adjacency matrix, of a graph, 168, 
523 


adjoint 
classical, 20 
Hermitian, 6 
adjugate, 20 
algebraic multiplicity, 58, 60, 138, 
141, 497, 499 
algebraically 
closed field, 41, 537 
simple eigenvalue, 371, 500, 508 
alternating sum, 8 
angle between vectors, 15 
annihilate, 142 
antilinear transformation, 250 
approximation problems, 332, 427 
arc, 357 
arg, 532 
argument of a complex number, 532 
arithmetic-geometric mean inequality, 
535 


augmented matrix, 11, 12 


back substitution, 159 
backward identity, 28, 207 
ball of radius r, 281, 541 


549 


basis, 3 
change of, 30 
orthonormal, 16 
representation, 31 
bilinear form, 169, 175 
biorthogonality, 59 
Birkhoff’s theorem, 197, 527 
block 
diagonal matrices, 24 
triangular matrices, 25 
Bochner’s theorem, 394 
bordered matrix, 185 
bound norm, 294 
boundary, 282 
bounded set, 282, 541 
Brauer 
condition for invertibility, 381 
region, 380 
theorem, 380 
Brualdi 
condition for invertibility, 389 
region, 385 
theorem, 385, 387 


cancellation theorem, 78, 141 
canonical forms 
consimilarity, 251 
integer matrices, 158 
irreducible normal form, 506 
Jordan, 121 
polar, 156, 412ff 
rational, 154 
rational canonical, 156 


550 Index 


canonical forms (cont.) 
rational matrices, 158 
real Jordan, 152 
real orthogonal matrices, 108 
real skew-symmetric matrices, 107 
real symmetric matrices, 107 
singular value decomposition, 157, 
414 ff 
symmetric Jordan, 209 
triangular factorization, 157 
Carmichael and Mason’s bound on 
zeroes, 317, 318, 364 
Cassini, ovals of, 380 
Cauchy 
sequence, 274 
bound on zeroes, 316, 318, 364 
Cauchy-Binet formula, 22 
Cauchy-Schwarz inequality, 15, 261, 
277, 535, 536 
Cayley-Hamilton theorem, 86 
Cesaro summation, 524 
change of basis matrix, 32, 33 
characteristic equation, 87 
characteristic polynomial, 38, 86, 87, 
540 
Cayley-Hamilton theorem, 86 
definition, 38 
positive definite matrix, 403 
Cholesky factorization, 114, 407 
circulant matrices, 26 
classical adjoint, 20 
closed set, 282, 541 
closure, 282 
cofactor, 17 
column rank, 12 


combinatorially symmetric matrix, 523 


commutative ring, 95 
commutator, 98 

commuting family, 51, 81, 99, 139 
commuting matrices, 135 
compact set, 282, 541 


companion matrix, 147, 149, 316, 514 
compatible vector norm, 294, 324, 327 


completeness property of a vector 
space, 274 
complex 
conjugate, 531 
numbers, 531, 532 
compound matrix, 19 


concave 
function, 534 
logarithm of determinant, 466 
trace of inverse, 468 
condiagonalization, 244, 248 
condition number, 336, 340, 365, 366, 
374, 442 
coneigenvalue 
characterization, 246 
definition, 245 
coneigenvector, 245 
conformal, 17 
congruence 
*congruence, 220, 399, 464ff, 470 
simultaneous *congruence, 
canonical pairs, 236 
Tcongruence, 220 
conjugate linear, 169 
conjunctive, 220 
consimilarity, 234, 244 
characterizations, 251 
to a real matrix, 255 
consistent 
linear equations, 12 
vector norm, 324 
constrained extrema, 34 
continuous 
dependence of eigenvalues, 540 
dependence of zeroes, 539 
function, maximum of a, 541 
contriangularization, 244 
convergence of a sequence, 269 
convergent matrix, 137, 298 
convex, 284 
combination, 535 
cone, 463 
function, 392, 533, 534-36 
hull, 533 
metric hull, 289 
sets, 533-36 
coordinate representation, 30 
correlation matrix, 400 
Courant-Fischer theorem, 179, 420, 
424, 472 
covariance matrix, 219, 239, 392, 
424 
Cramer’s rule, 21 
cycle, 357 
cyclic of index k, 514 


Index 


defect from normality, 316 
defective, 58 
deflation, 63, 83 
deleted absolute row sums, 344 
dependent, 3 
determinant, 7, 11, 398 
determinantal inequalities, 453, 467, 
476-86 
Fischer, 478 
GerSgorin, 351 
Hadamard, 477 
Hadamard-Fischer, 485 
Minkowski, 482 
Oppenheim, 480 
Ostrowski-Taussky, 481 
Szasz, 479 
diagonal 
entries, equal, 77 
matrix, 23 
diagonalizable, 139, 145 
almost, 89 
by orthogonal similarity, 211 
definition, 46 
orthogonally, 101 
simultaneously, 49 
unitary, 101 
diagonalization 
by congruence, 228 
by consimilarity, 234, 244, 248 
by similarity, 46, 145 
by unitary congruence, 204 
by unitary consimilarity, 244, 245 
by unitary similarity, 101 
simultaneous, 52 
diagonally dominant, 349 
strictly, 302, 349 
difference scheme, 394 
differential equations, 132, 394 
elliptic, 239, 459 
hyperbolic, 239 
partial, 168, 216, 218 
dimension, 4 
direct sum, 24 
directed 
graph of a matrix, 357, 517, 522 
path, 357 
dominant eigenvalue, 506 
doubly stochastic matrix, 197, 527-29 
dual norm, 275, 410 


of Euclidean norm, 277 
dual pair, 278 
duality theorem, 287 


edges, 168 
eigenspace, 57 
eigenvalue 
algebraic multiplicity, 58, 60, 138, 
141, 497, 499 
algebraically simple, 371 
continuous dependence on matrix 
entries, 540 
definition, 35 
deflation to calculate, 63 
distinct, 48 
dominant, 506 
generalized, 213 
geometric multiplicity, 58, 60, 138, 
141, 497, 498 
ill-conditioned, 367 
inclusion region, 501 
inclusion theorem, 177 
index of, 131, 139, 148 
location, 343 
moments, 43 
nonnegative matrix, 489 
of asum, 181, 184 
perfectly conditioned, 367 
perturbation, 198, 343, 364 
positive definite matrix, 402 
positive matrix, 496 
power method to calculate, 62 
principal submatrices, 189 
variational characterization, 176-80 
well-conditioned, 367 
eigenvector, 57 
definition, 35 
left, 59, 371 
positive, 493, 494, 495, 513 
right, 59 
elementary divisors, 155 
elementary symmetric functions, 41 
elliptic differential operator, 239 
equilibrated, 283 
equivalence relation 
congruence, 221 
consimilarity, 251 
definition, 45 
vector seminorm, 262 


552 Index 


equivalent 
matrices, 164 
orthogonally, 73 
real orthogonally, 73 
unitarily, 72 
vector norms, 273, 279 
error analysis, 335 
essentially 
nonnegative matrix, 506 
triangular matrix, 26 
Euler’s theorem, 111 
exponential of a matrix, 300 
extreme points 
closed convex set, 533 
doubly stochastic matrices, 528 
extreme ray 
definition, 463 
positive semidefinite matrices, 464 


factor analysis, 431 
factorizations, 156 
Cholesky, 114, 407 
complex skew-symmetric matrix, 
217 
complex symmetric matrix, 204 
LU, 158-65 
polar, 156, 411, 412ff 
product of two Hermitian matrices, 
172 
QR, 112, 164, 406 
singular value decomposition, 411 
Takagi, 250, 423, 466 
triangular, 157 
family 
commuting, 51, 81, 99, 139 
commuting real normal, 108, 112 
complex symmetric, 243 
diagonalizable symmetric, 217 
Hermitian, 172 
normal, 103 
simultaneous condiagonalization, 
252 
simultaneous diagonalization by 
*congruence, 239 
simultaneous diagonalization by 
unitary “congruence, 243 
simultaneous singular value 
decomposition, 426 
simultaneous triangularization, 84 


Fan 
k norms, 445 
theorem on eigenvalue location, 501 
Fejer 
trace theorem, on positive 
semidefinite matrices, 459 
uniqueness theorem, for elliptic 
partial differential equations, 460 
field, 1 
of values, 321, 332 
Fischer’s inequality, 478 
forms 
bilinear, 169, 175 
Hermitian, 174 
quadratic, 168, 174, 214, 466 
sesquilinear, 169 
forward substitution, 159 
fundamental theorem of algebra, 537 


general linear group, 14 
generalized 
coordinates, 227 
eigenvalue, 213 
inverse, 421 
matrix functions, 8 
matrix norm, 290, 320, see also 
vector norm 
geometric multiplicity, 58, 60, 138, 
141, 497, 498 
GerSgorin 
circles, 346 
disc theorem, 344 
discs, 345, 353 
region, 345 
Givens’s method, 77 
Gram matrix, 407 
Gram-Schmidt process, 15, 148 
modified, 116 
symmetric analogue, 211 
graph, 168 
Greub and Rheinboldt inequality, 
452 
group 
finite Abelian, 510 
general linear, 14 
isometry, 266, 267 
orthogonal, 69, 71 
unitary, 69 
Grunsky inequalities, 202 


Index 


Hadamard 
exponential of a matrix, 461 
inequality, 199, 200, 477, 483 
powers of a matrix, 462 
product, 321, 455, 456, 457, 474, 475 
square root of a matrix, 462 
Hadamard-Fischer inequalities, 485 
Hahn-Banach theorem, 288 
half-spaces, 534 
Hankel matrix, 27, 202, 393 
Hausdorff moment sequence, 393 
Hermitian 
part, 109, 170, 399 
property, 260 
adjoint, 6 
Hermitian matrices 
*congruent, 223, 224 
product of three, 469 
product of two, 172 
Hermitian matrix, 104, 167, 169, 397 
analogous to real numbers, 170 
characterizations, 171 
partitioned, 175 
product with positive definite 
matrix, 465 
spectral theorem, 171 
Hessenberg matrix, 28, 157 
Hessian, 167, 392, 459, 534 
Hilbert matrix, 341, 401 
H6lder’s inequality, 276, 536 
Hoffman-Wielandt theorem, 368, 419 
Holladay-Varga theorem, 520 
homogeneous, 259, 260, 290 
Hopf’s bound, 501 
Householder 
transformation, 74, 77, 78, 117 
method, 78 
hyperbolic differential operator, 239 
hyperplane, 534 


idempotent, 37, 148, 311 
identity 
backward, matrix, 28, 207 
Jacobi, 21 
matrix, 6 
Newton, 44 
parallelogram, 263 
polarization, 263 
Sylvester, 22 


553 


ill-conditioned, 336 
imaginary 
axis, 532 
part of a complex number, 531 
inclusion 
principle, 189 
region, 378 
indefinite matrix, 397 
independent, 3 
index 
citation, 553 
of an eigenvalue, 131, 139, 148 
of nilpotence, 37 
of primitivity, 519 
indicator matrix, 356 
induced matrix norm, 292 
by absolute vector norm, 310 
by monotone vector norm, 310, 365 
characterization, 302, 307 
inequality 
arithmetic-geometric mean, 535 
between matrix norms, 314 
between vector norms, 279 
bilinear, 473 
Cauchy-Schwarz, 15, 261, 277, 535, 
536 
determinant, 351 
Fischer, 478 
Greub and Rheinboldt, 452 
Grunsky, 202 
Hadamard, 199, 200, 477, 483 
Hadamard-Fischer, 485 
Holder, 276, 535 
Kantorovich, 444, 451, 452 
matrix norm, 290, 312 
Minkowski, 265, 536 
Minkowski’s determinant inequality, 
482 
numerical radius, 331 
Oppenheim, 480 
Ostrowski-Taussky, 468, 481 
positive definite function, 400 
power for numerical radius, 333, 334 
rank, 352 
Robertson, 468 
square root continuity, 411 
submultiplicative, 290 
Szasz, 479 
triangle, 259, 290 


554 Index 


inequality (cont.) 
unitarily invariant matrix norm, 450 
unitarily invariant norms, 447 
weighted arithmetic-geometric 
mean, 535 
Wielandt, 442, 443 
inertia of a matrix, 221 
infinite series of matrices, 300 
inner product, 410 
characterization of norm derived 
from, 263 
definition, 260 
Frobenius, 332 
standard, 14 
usual, 14 
interior point, 282 
interlacing 
eigenvalues theorem for bordered 
matrices, 185 
inequalities, 182, 185, 187, 189, 200, 
404, 419 
property for singular values, 419 
theorem, 182 
interpolation, 29 
invariant 
factors, 154 
subspace, 51 
inverse, 14 
Brauer’s condition, 381 
Brualdi’s condition, 389 
diagonal dominance, 355 
errors in, 335 
generalized, 421 
irreducibly diagonally dominant, 
363 
Levy—Desplanques condition, 302, 
349 
minors of, 21 
Ostrowski’s condition, 381 
partitioned matrix, 18, 472 
series for, 301 
small rank adjustment, 18 
strict diagonal dominance, 302, 349 
invertible, 14 
irreducible matrix, 361, 362, 493, 
506-15 
minimal polynomial criterion, 515 
irreducible normal form, 506 
irreducibly diagonally dominant, 362 


isometry, 68 
for a vector norm, 266 
group, 266, 267 
isomorphism, 4 


Jacobi 

identity, 21 

method, 76 
Jacobian matrix, 218 
Jordan 

block, 121 

matrix, 121, 129 

normal form, 121 
Jordan canonical form, 121, 129 

real, 152 

theorem, 126 


Kakeya’s theorem, 318 

Kantorovich’s inequality, 444, 451, 
452 

kernel, 456, 462 

Kojima’s bound on zeroes, 319, 364 

Krein-Milman theorem, 533, 534 

Kronecker product, 474, 475 

Krylov sequence, 164 

Ky Fan, see Fan 


Lagrange 
equations, 227 
interpolating polynomial, 29, 188, 
405 
interpolation, 29 
interpolation formula, 30 
Lanczos tridiagonalization, 164 


Laplace 
equation, 239 
expansion, 7 


least squares 
approximation, 429, 431, 515 
solution, 421 
left eigenvector, 59, 371, 497, 504 
left Perron vector, 497 
Levy—Desplanques 
condition for invertibility, 302, 349 
theorem, 302, 349 
limit 
of a sequence, 270 
point, 282 
line segment, 289 


Index 


linear 
dependence, 3 
independence, 3, 407 
transformation, 5 
loop, 358 
LU factorization, 158-65 
lub norm, 294 


majorization, 199, 425, 446 
and unitarily invariant norms, 447 
characterizations, 197 
definition, 192 
eigenvalues by diagonal entries, 193, 
196 
product inequality, 199 
spectrum of a sum, 194 
Markov chain, 497 
Mason and Carmichael’s bound on 
zeroes, 317, 318, 364 
matrix 
adjacency, 168, 523 
almost diagonalizable, 89 
approximation problems, 427 
backward identity, 207 
block diagonal, 24 
block triangular, 25, 90 
bordered, 185 
change of basis, 32 
circulant, 26 
combinatorially symmetric, 523 
commuting, 135 
companion, 147, 149, 316 
complex orthogonal, 71, 72 
complex symmetric, 201 
compound, 19 
convergent, 137 
correlation, 400 
covariance, 219, 239, 392, 424 
diagonal, 23 
diagonalizable, 46, 139, 145 
doubly stochastic, 197, 527-29 
equivalent, 164 
essentially nonnegative, 506 
essentially triangular, 26 
exponential, 300 
function of a, 300 
Gram, 407 
Hankel, 27, 202, 393 
Hermitian, 109, 167, 169 


555 


Hessenberg, 28 

Hessian, 392, 459, 534 

Hilbert, 341, 401 

identity, 6 

indefinite, 397 

indicator, 356 

inertia of a, 221 

infinite series, 300 

irreducible, 361, 362 

Jacobian, 218 

Jordan, 121, 129 

negative definite, 397 

nilpotent, 139 

nonderogatory, 135 

nonnegative, see nonnegative matrix 

normal, 100 

normal skew-symmetric, 217 

normal symmetric, 207 

orthogonal, 71, 72 

orthogonally diagonalizable, 211 

orthostochastic, 197 

permutation, 25 

polynomial in a, 36 

positive, see positive matrix 

positive definite, see positive 
definite matrix 

positive semidefinite, see positive 
semidefinite matrix 

primitive, see primitive matrix 

rank one, 61 

real orthogonal, 66, 72, 107 

real skew-symmetric, 107 

real symmetric, 107 

reducible, 360 

scalar, 23 

signature of a, 221 

similarity, 44 

skew-Hermitian, 100, 169 

skew-orthogonal, 72 

skew-symmetric, 109 

skew-symmetric normal, 217 

square root of, 54, 405 

stochastic, 526-29 

symmetric, 167, 201 

symmetric diagonalizable, 211 

symmetric normal, 207 

symmetric unitary, 215 

Toeplitz, 27, 136, 137, 394, 409, 
456, 462 


556 Index 


matrix (cont.) 
triangular, 24 
tridiagonal, 28, 174, 395, 409, 506 
unitary, 66, 109 
unitary characterizations, 67 
unitary symmetric, 215 
Vandermonde, 29 
weakly irreducible, 383 
matrix functions, generalized, 8 
matrix norm 
bound norm, 294 
Euclidean norm, 291 
Fan k norms, 445 
Frobenius norm, 291 
generalized, 320-42, see also 
vector norm 
Hilbert-Schmidt norm, 291 
induced, 307, 365 
induced by a similarity, 296 
induced by vector norm, 292, 294 
inequalities between, 314 
inequality for, 312 
Ky Fan k norms, see Fan k norms 
l, norm, 291 
L norm, 291 
nlo norm, 292 
lub norm, 294 
Maximum column sum norm, 294 
maximum row sum norm, 295 
minimal, 306, 307 
not convex set, 312 
operator norm, 294 
Schatten p norm, 441 
Schur norm, 291 
self-adjoint, 309 
spectral norm, 295, 441 
trace norm, 441 
unitarily invariant, 292, 296, 308 
max-min theorem, 179, 493, 496, 504 
maximal element, 384 
maximum 
column sum matrix norm, 294 
of a continuous function, 541 
row sum matrix norm, 295 
McCoy’s theorem, 94, 97 
Mercer’s theorem, 456 
metric convex hull, 289 
min-max theorem, 179, 493, 496 
minimal matrix norm, 306, 307 


minimal polynomial, 142, 145 
algorithm to compute, 144, 148 
criterion for irreducibility, 515 
definition, 143 
diagonalization criterion, 145 
of a direct sum, 149 

minimally spectrally dominant, 330 

Minkowski 
determinant inequality, 482 
inequality, 265, 536 

minor, 17 
principal, 17, 40, 398 
signed, 17 

modified Gram-Schmidt process, 116 

modulus, 532 

moments of eigenvalues, 43, 44 

moment sequence 
Hausdorff, 393 
Toeplitz, 393 

moments 
algebraic, 393 
trigonometric, 393, 455, 456 

monic polynomial, 142 

monotone vector norm, 285, 310, 450 

monotonicity theorem, 182 

Montel’s bound on zeroes, 317, 318, 364 

Moore-Penrose generalized inverse, 

421 

multilinear, 11 

multiplicity 
algebraic, 58, 60, 138, 141, 497, 499 
geometric, 58, 60, 138, 141, 497, 498 


negative 
definite matrix, 397 
semidefinite matrix, 397 
Nehari’s theorem, 202 
Newton’s identities, 44 
nilpotent, 37, 139 
nodes, 168' 
nondefective, 58, 103 
nonderogatory, 58, 135, 147 
nonnegative, 259, 260, 290 
nonnegative matrix, 359 
applications, 487-90 
definition, 490 
doubly stochastic, 527-29 
eigenvalues, 489, 503-505, 507-15 
eigenvectors, 489, 503-505, 507-15 


Index 


general limit theorem, 524 
irreducible, 507-15 
limit of powers, 489 
limit theorems, 500, 516, 524, 525 
Perron root, 505, 508 
Perron vector, 505, 508 
Perron-Frobenius theorems, 508-11 
primitive matrices, 515-24 
spectral radius, 489, 491-95, 
503-505, 507-15 
stochastic, 526-29 
nonsingularity, characterizations of, 14 
nontrivial cycle, 383 
norm 
and inversion, 301 
characterization of derived, 263 
compatible, 294, 324 
consistent, 324 
dual, 275, 410 
dual pair, 278 
matrix, see matrix norm 
minimally spectrally dominant, 330 
pre-norm, 272 
spectrally dominant, 324 
subordinate, 324 
vector, see vector norm 
normal matrix 
characterizations, 101, 108-12 
definition, 100 
perfectly conditioned eigenvalues, 
367 
real, 104ff 
normalized vector, 15 
null space, 5, 262 
numerical radius, 321, 331, 332, 333, 
334 
numerical range, 321 


open set, 282, 541 
operator norm, 294 
Oppenheim’s inequality, 480 
orthogonal 
complement, 16 
group, 69, 71 
vectors, 15 
orthogonal matrix, 157 
complex, 71, 72 
real, 66, 72 
skew, 72 


557 


orthogonality, 15 
orthogonally 
diagonalizable, 101 
equivalent, 73 
orthonormal, 15 
basis, 16 
orthostochastic matrices, 197 
Ostrowski 
condition for invertibility, 381 
region, 378, 379 
theorem, 224, 378 
Ostrowski-Taussky inequality, 468, 481 
ovals of Cassini, 380 


parallelogram identity, 263 
partitioned matrix 

definition, 17 

inverse, 18, 472 

Schur complement, 472 
Pearcy’s theorem, 76 
perfectly conditioned, 336 
permanent, 8 
permutation, 368 

invariant, 438 

matrix, 25, 360 
Perron 

root, 497, 505, 508 

theorem, 500 

vector, 497, 505, 508 
Perron—Frobenius theorems, 508-11 
perturbation 

eigenvalues, 198, 343, 364 

linear equations, 335 

theorems, 364 
plane rotations, 74 
Poincaré separation theorem, 190, 441 
polar 

coordinates in complex plane, 532 

decomposition, see polar form 
polar form, 156, 411, 412ff 

examples and applications, 427ff 
polarization identity, 263 
polynomial 

bounds for zeroes of, 316-19 

characteristic, 38 

continuous dependence of zeroes on 

coefficients, 539 
for inverse, 88 
in a matrix, 36, 135, 142 


558 Index 


polynomial (cont.) 
monic, 142 
similarity invariants, 154 
zeroes of a, 537 
zeroes of a real, 538 
poorly conditioned, 336 
positive, 259, 260, 290 
cone, 398 
definite function, 400, 401, 463 
definite kernel, 402, 462 
positive definite matrix, 250 
applications, 391-96, 459 
characteristic polynomial, 403 
characterizations, 402 
concavity of logarithm of the 
determinant, 466 
concavity of trace of the inverse, 
468 
definition, 396 
determinant criterion, 404 
determinantal inequalities, 476-86 
eigenvalues, 402 
kth root, 405 
ordering, 469ff 
square root, 405 
positive matrix, 359 
definition, 490 
eigenvalues, 495-503 
eigenvectors, 495-503 
left Perron vector, 497 
Perron root, 497 
Perron vector, 497 
Perron’s theorem, 500 
spectral radius, 495-503 
positive semidefinite matrix, 182 
definition, 396 
kth root, 405 
ordering, 469ff 
rank k, 457 
Positive semidefinite ordering, 469 
power 
inequality, 333, 334 
method, 62, 523 
pre-norm, 272, 322 
preorder, 384 
primitive matrix, 515-24 
characterizations, 516, 517 
definition, 516 
directed graph, 517 


eigenvalues, 516 
Holloday-Varga theorem, 520 
index of primitivity, 519 
Wielandt’s theorem, 520 
principal minor, 17, 40 
principal submatrix, 17 
eigenvalues, 189 
Procrustean transformation, 431 
property 
L, 97, 100 
P, 100 
SC, 355, 358, 359, 362 
pure imaginary complex number, 532 


QR 

algorithm, 114 

factorization, 112, 164, 406 
quasi-linearization, 191, 453, 455, 486 


range, 5 
rank, 12 
equalities, 13 
inequalities, 13, 175, 352, 458 
rank one 
approximation, 427 
limit, 499 
matrix, 61 
rational 
canonical form, 156 
form, 154 
Rayleigh-Ritz 
ratio, 176 
theorem, 176, 422 
real 
axis, 532 
Jordan canonical form, 152 
part of a complex number, 531 
rectangular coordinates in complex 
plane, 532 
reducible matrix, 360 
residual vector, 338, 373, 374 
reverse order law, 6 
Riemann sum, 462 
right 
eigenvector, 59 
half-plane, 532 
Robertson’s inequality, 468 
Romanovsky’s theorem, 517 
rotation problem, 431, 435 


sasama amber an vossas manea tbe esa? 


Index 


round-off errors, 335 
row rank, 12 
row-reduced echelon form (RREF), 10 


scalar 
matrix, 23 
product, 14 
Schatten p norms, 441 
Schur 
complement, 21, 472 
majorization theorem, 193 
norm, 291 
product, see Hadamard product 
product theorem, 455, 458 
unitary triangularization theorem, 
79, 83 
selection principle, for unitary 
matrices, 69, 117, 416 
self-adjoint 
matrix norm, 309 
vector norm on matrices, 450 
seminorm, 259 
sensitivity 
of eigenvalues, 372 
of eigenvectors, 373 
of solutions of linear equations, 339 
separating hyperplane theorem, 534 
separation theorem, 190 
sesquilinear forms, 169 
shift operator, 38 
signature of a matrix, 221 
signum (sgn), 8 
similarity, 44 
inverse to adjoint, 70 
matrix and its transpose, 134 
to a real matrix, 172 
to a real matrix, AA, 253 
to adjoint, 172 
simultaneous 
*congruence, canonical pairs, 236 
condiagonalization, 252 
singular value decomposition, 426 
triangularization, 81, 84, 94 
simultaneous diagonalization, 49, 52 
by *congruence, 240, 464 ff 
by congruence, 228, 241, 250 
by unitary congruence, 228, 235 
characterization of, 228 
singular, 14 


559 


singular value decomposition, 157, 
205, 325, 411, 414ff, 421 
examples and applications, 427ff 
simultaneous, 426 
singular values, 205, 415 
largest, 421 
of a product, 423 
of asum, 423 
perfectly conditioned, 418 
variational characterization, 420 
skew orthogonal, 72 
skew-Hermitian 
matrix, 100, 169 
part, 109, 170, 399 
skew-symmetric matrix, 397 
solution-equivalent systems, 11 
span, 2, 3 
Specht’s theorem, 76 
spectral 
characteristic, 330 
norm, 295, 308, 309, 421, 441, 445, 
450 
spectral radius, 35, 198, 296, 313, 348, 
489 
as a limit of matrix norms, 297 
as a limit using norms or pre-norms, 
299, 322 
spectral theorem 
Hermitian matrices, 104, 171 
normal matrices, 101, 425 
spectrally dominant, 329 
spectrum, 35 
of a sum, 92 
of asum by majorization, 194 
square root of a matrix, 54, 405 
continuity of, 411 
standard 
basis, 4 
inner product, 14 
stochastic matrix, 526-29 
strictly diagonally dominant, 302, 
349 
strongly connected directed graph, 
358, 362, 383 
submatrix, 4 
eigenvalues, 189 
principal, 17, 397 
submultiplicative, 290 
subordinate vector norm, 324 


560 Index 


subspace, 2 
invariant, 51 
Sylvester’s identity, 22 
Sylvester’s law of inertia, 223, 238 
analogue for symmetric matrices, 225 
homotopy proof, 242 
symmetric 
gauge function, 438, 445 
Jordan canonical form, 209 
matrices, 7congruent, 225 
symmetric matrix, 167, 397 
complex, 201 
diagonalizable, 211 
every matrix similar to a, 209 
product of two, 210 
real, 169, 218 
Szasz’s inequality, 479 


Takagi’s factorization, 204, 423, 466 
as consimilarity analogue of spectral 
theorem, 250 
Hua’s proof, 217 
Siegel’s proof, 216 
Taussky’s theorem, 363 
Toeplitz 
matrix, 27, 136, 137, 394, 409, 456, 
462 
moment sequence, 393 
topological notions, 282, 288 
trace, 40, 175, 398 
norm, 441, 445 
zero, 77 
transpose, 6 
transposition, 25 
triangle inequality, 259, 290 
triangular 
factorization, 157 
matrices, 24 
triangularization 
by consimilarity, 244, 245 
by unitary congruence, 203 
by unitary consimilarity, 244, 245 
by unitary similarity, 79 
orthogonal, 84 
simultaneous, 81, 84, 94 
tridiagonal matrix, 28, 174, 395, 409, 
506 
tripotent, 148 
trivial cycle, 358 


truncation errors, 335 


unit 
disc, 532 
sphere, 273 
vector, 15 
unit ball, 273, 281 
compact, 283, 284 
convex, 284 
equilibrated, 284 
properties of, 283 
unitarily diagonalizable, 101 
unitarily equivalent 
definition, 72 
equal diagonal entries, 77 
Specht’s theorem, 76 
to upper triangular matrix, 79 
unitarily invariant 
norm, 292, 296, 437 
vector norm, 265, 267 
unitarily invariant matrix norms, 296, 
308 
set is convex, 450 
unitarily invariant vector norm, on 
matrices, 437, 441, 445 
Von Neumann’s characterization, 
438 
when a matrix norm, 450 
unitary 
group, 69 
matrices, selection principle, 117 
unitary matrix, 66-72, 157 
characterizations, 67 
definition, 66 
selection principle, 69 
upper half-plane, 532 


Vandermonde matrix, 29 
variational characterization, of 
eigenvalues, 176 
vector 
normalized, 15 
unit, 15 
vector norm 
absolute, 285, 310, 365, 438 
algebraic properties, 268 
analytic properties, 269 
Cauchy sequence with respect to a, 
274 


Index 


characterization via unit ball, 284 
compatible, 324, 327 
consistent, 324 

convergence with respect to a, 269 
definition, 259 

derived from inner product, 262 
dual, 275 

duality theorem, 287 
equivalent, 273, 279 
Euclidean, 264 

generalized matrix norm, 290 
geometric properties, 281 
inequalities between, 279 
isometry, 266 

l, norm, 265 

h norm, 264 

l, norm, 265 

le norm, 265, 322 

L; norm, 266 

L, norm, 266 

L, norm, 267 

L norm, 267 

Manhattan norm, 265 

max norm, 265 

monotone, 285, 310, 365, 449 
on matrices, 320-42 
polyhedral, 282 

pre-norm, 272, 322 

spectrally dominant, 329 
subordinate, 324 

sum norm, 265 


561 


uniformly continuous, 271 
unit ball, 273, 281 
unit sphere, 273 
unitarily invariant, 265, 267 
unitarily invariant on matrices, 437 
weakly monotone, 285 

vector seminorm, 259 

vector space, 2 
complete, 274 

Von Neumann 
inner product theorem, 263 
unitarily invariant norm theorem, 

438 


weak minimum principle, 460 
weakly 
connected directed graph, 383 
irreducible matrix, 383 
monotone norm, 285 
Weierstrass’s theorem, 541 
weighted arithmetic-geometric mean 
inequality, 535 
well conditioned, 336 
Weyl’s theorem, 181, 184, 367, 419, 
423 
Wielandt 
inequality, 442, 443 
example, 522 
theorem, 520 
Wielandt-Hoffman theorem, 368, 419 
Witt cancellation theorem, 78, 141 


